Update README.md

2b25b52 verified 4 days ago

10.5 kB

	---
	license: apache-2.0
	datasets:
	- nebius/SWE-rebench-openhands-trajectories
	base_model:
	- Qwen/Qwen3-30B-A3B-Instruct-2507
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- code
	- agent
	---

	# Model Summary

	SWE-rebench-openhands-Qwen3-30B-A3B is a 30B Rejection Sampling Fine-Tuning (RFT) checkpoint derived from
	[Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507), trained on the newly released
	[nebius/SWE-rebench-openhands-trajectories](https://huggingface.co/datasets/nebius/SWE-rebench-openhands-trajectories) dataset.
	Training used a maximum sequence length of 131k tokens.

	<table>
	<thead>
	<tr>
	<th rowspan="2">Model</th>
	<th rowspan="2">Size</th>
	<th colspan="4" style="background-color: #fff3cd;">Maximum Number of Turns = 100</th>
	<th colspan="4" style="background-color: #d4edda;">Maximum Number of Turns = 500</th>
	</tr>
	<tr>
	<th style="background-color: #fff3cd;">Pass@1</th>
	<th style="background-color: #fff3cd;">Pass@5</th>
	<th style="background-color: #fff3cd;">Pass@1</th>
	<th style="background-color: #fff3cd;">Pass@5</th>
	<th style="background-color: #d4edda;">Pass@1</th>
	<th style="background-color: #d4edda;">Pass@5</th>
	<th style="background-color: #d4edda;">Pass@1</th>
	<th style="background-color: #d4edda;">Pass@5</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td colspan="10"><strong>30B scale</strong></td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507">Qwen3-30B-A3B-Instruct-2507</a></td>
	<td>30B</td>
	<td style="background-color: #fff3cd;text-align: center;">25.2</td>
	<td style="background-color: #fff3cd;text-align: center;">44.8</td>
	<td style="background-color: #fff3cd;text-align: center;">11.8</td>
	<td style="background-color: #fff3cd;text-align: center;">24.4</td>
	<td style="background-color: #d4edda;text-align: center;">25.7</td>
	<td style="background-color: #d4edda;text-align: center;">44.2</td>
	<td style="background-color: #d4edda;text-align: center;">14.2</td>
	<td style="background-color: #d4edda;text-align: center;">26.5</td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct">Qwen3-Coder-30B-A3B-Instruct</a></td>
	<td>30B</td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>51.9</strong></td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>67.3</strong></td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>28.7</strong></td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>42.8</strong></td>
	<td style="background-color: #d4edda;text-align: center;"><strong>50.0</strong></td>
	<td style="background-color: #d4edda;text-align: center;">63.0</td>
	<td style="background-color: #d4edda;text-align: center;"><strong>28.1</strong></td>
	<td style="background-color: #d4edda;text-align: center;"><strong>38.7</strong></td>
	</tr>
	<tr style="background-color: #ebeced">
	<td style="color: black;">nebius/SWE-rebench-openhands-Qwen3-30B-A3B (Ours)</td>
	<td>30B</td>
	<td style="background-color: #ffdf80;text-align: center;">49.7<br/>(+24.5)</td>
	<td style="background-color: #ffdf80;text-align: center;">65.4<br/>(+20.6)</td>
	<td style="background-color: #ffdf80;text-align: center;">28.1<br/>(+16.3)</td>
	<td style="background-color: #ffdf80;text-align: center;">38.7<br/>(+14.3)</td>
	<td style="background-color: #9df2b3;text-align: center;"><strong>50.3</strong><br/>(+24.6)</td>
	<td style="background-color: #9df2b3;text-align: center;"><strong>68.3</strong><br/>(+24.1)</td>
	<td style="background-color: #9df2b3;text-align: center;"><strong>28.1</strong><br/>(+13.9)</td>
	<td style="background-color: #9df2b3;text-align: center;"><strong>38.7</strong><br/>(+12.2)</td>
	</tr>
	<tr>
	<td colspan="10"><strong>100B+ scale</strong></td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/zai-org/GLM-4.5-Air">GLM-4.5-Air</a></td>
	<td>106B</td>
	<td style="background-color: #fff3cd;text-align: center;">58.2</td>
	<td style="background-color: #fff3cd;text-align: center;">73.5</td>
	<td style="background-color: #fff3cd;text-align: center;">33.8</td>
	<td style="background-color: #fff3cd;text-align: center;">42.8</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	</tr>
	<tr>
	<td colspan="10"><strong>200B+ scale</strong></td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507">Qwen3-235B-A22B-Instruct-2507</a></td>
	<td>235B</td>
	<td style="background-color: #fff3cd;text-align: center;">45.2</td>
	<td style="background-color: #fff3cd;text-align: center;">65.9</td>
	<td style="background-color: #fff3cd;text-align: center;">29.3</td>
	<td style="background-color: #fff3cd;text-align: center;">44.8</td>
	<td style="background-color: #d4edda;text-align: center;">46.2</td>
	<td style="background-color: #d4edda;text-align: center;">67.5</td>
	<td style="background-color: #d4edda;text-align: center;">25.3</td>
	<td style="background-color: #d4edda;text-align: center;">40.8</td>
	</tr>
	<tr>
	<td style="color: black;"><a href="https://huggingface.co/nebius/SWE-rebench-openhands-Qwen3-235B-A22B">nebius/SWE-rebench-openhands-Qwen3-235B-A22B</a> (Ours)</td>
	<td>235B</td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>59.9</strong><br/>(+14.7)</td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>73.9</strong><br/>(+8.0)</td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>35.1</strong><br/>(+5.8)</td>
	<td style="background-color: #fff3cd;text-align: center;"><strong>46.9</strong><br/>(+2.1)</td>
	<td style="background-color: #d4edda;text-align: center;"><strong>61.7</strong><br/>(+15.5)</td>
	<td style="background-color: #d4edda;text-align: center;"><strong>74.3</strong><br/>(+6.8)</td>
	<td style="background-color: #d4edda;text-align: center;"><strong>34.2</strong><br/>(+8.9)</td>
	<td style="background-color: #d4edda;text-align: center;"><strong>44.8</strong><br/>(+4.0)</td>
	</tr>
	<tr>
	<td colspan="10"><strong>300B+ scale</strong></td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a></td>
	<td>355B</td>
	<td style="background-color: #fff3cd;text-align: center;">64.4</td>
	<td style="background-color: #fff3cd;text-align: center;">76.2</td>
	<td style="background-color: #fff3cd;text-align: center;">33.8</td>
	<td style="background-color: #fff3cd;text-align: center;">44.8</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	<td style="background-color: #d4edda;text-align: center;">-</td>
	</tr>
	<tr>
	<td><a href="https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct">Qwen3-Coder-480B-A35B-Instruct</a></td>
	<td>480B</td>
	<td style="background-color: #fff3cd;text-align: center;">64.7</td>
	<td style="background-color: #fff3cd;text-align: center;">75.8</td>
	<td style="background-color: #fff3cd;text-align: center;">36.3</td>
	<td style="background-color: #fff3cd;text-align: center;">44.8</td>
	<td style="background-color: #d4edda;text-align: center;">66.5</td>
	<td style="background-color: #d4edda;text-align: center;">77.8</td>
	<td style="background-color: #d4edda;text-align: center;">35.5</td>
	<td style="background-color: #d4edda;text-align: center;">42.8</td>
	</tr>
	</tbody>
	</table>

	Table 1. Pass@1 (averaged over 5 runs) and Pass@5 for OpenHands agent with the maximum number of turns set to 100
	(highlighted in <span style="background-color: #fff3cd; padding: 4px;">yellow</span>) and 500
	(highlighted in <span style="background-color: #d4edda; padding: 4px;">green</span>). Metrics are reported in percentages.
	Deltas vs base models are shown in parentheses for fine-tuned models.

	We explicitly excluded all [SWE-bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) and
	[SWE-rebench September](https://huggingface.co/datasets/nebius/SWE-rebench-leaderboard) issues from training to avoid contamination.
	SWE-rebench Verified was additionally decontaminated on repository level.

	When evaluated with the OpenHands (v0.54.0) agent, our 30B model:

	* Substantially improves over the base Qwen3-30B-A3B-Instruct-2507 model, with +24.6 Pass@1 and +24.1 Pass@5 gains at 500-turn settings.
	* Matches or surpasses the specialized Qwen3-Coder-30B-A3B-Instruct baseline at 500 turns on SWE-bench Verified (50.3% vs 50.0% Pass@1; 68.3% vs 63.0% Pass@5).
	* Generalizes to longer interaction horizons, despite training on trajectories capped at 100 turns.

	For more details see our report in [Nebius blog](https://nebius.com/blog/posts/openhands-trajectories-with-qwen3-coder-480b).

	---

	# Best Practices

	1. Deployment:
	* Use the following configuration to serve the model with vLLM:
	```bash
	VLLM_USE_V1=1 vllm serve nebius/SWE-rebench-openhands-Qwen3-30B-A3B
	--tensor-parallel-size 8
	--served-model-name qwen_3_instruct_2507
	--disable-log-requests
	--enable-prefix-caching
	--max-model-len 131072
	--enable-auto-tool-choice
	--tool-call-parser hermes
	```
	Tested using `vllm/vllm-openai:v0.9.0` Docker image.

	2. Sampling Parameters:
	* For optimal performance, we recommend `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`
	that are consistent with the base model.

	---

	# Citation

	```
	@article{trofimova2025openhandstrajs,
	title={OpenHands Trajectories with Qwen3-Coder-480B-A35B-Instruct},
	author={Trofimova, Maria and Shevtsov, Anton and Ibragim, Badertdinov and Pyaev, Konstantin and Karasik, Simon and Golubev, Alexander},
	year={2025},
	journal={Nebius blog},
	note={}
	}
	```