ibragim-bad's picture
Update README.md
2b25b52 verified
---
license: apache-2.0
datasets:
- nebius/SWE-rebench-openhands-trajectories
base_model:
- Qwen/Qwen3-30B-A3B-Instruct-2507
pipeline_tag: text-generation
library_name: transformers
tags:
- code
- agent
---
# Model Summary
**SWE-rebench-openhands-Qwen3-30B-A3B** is a 30B Rejection Sampling Fine-Tuning (RFT) checkpoint derived from
[Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507), trained on the newly released
[nebius/SWE-rebench-openhands-trajectories](https://huggingface.co/datasets/nebius/SWE-rebench-openhands-trajectories) dataset.
Training used a maximum sequence length of 131k tokens.
<table>
<thead>
<tr>
<th rowspan="2">Model</th>
<th rowspan="2">Size</th>
<th colspan="4" style="background-color: #fff3cd;">Maximum Number of Turns = 100</th>
<th colspan="4" style="background-color: #d4edda;">Maximum Number of Turns = 500</th>
</tr>
<tr>
<th style="background-color: #fff3cd;">Pass@1</th>
<th style="background-color: #fff3cd;">Pass@5</th>
<th style="background-color: #fff3cd;">Pass@1</th>
<th style="background-color: #fff3cd;">Pass@5</th>
<th style="background-color: #d4edda;">Pass@1</th>
<th style="background-color: #d4edda;">Pass@5</th>
<th style="background-color: #d4edda;">Pass@1</th>
<th style="background-color: #d4edda;">Pass@5</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="10"><strong>30B scale</strong></td>
</tr>
<tr>
<td><a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507">Qwen3-30B-A3B-Instruct-2507</a></td>
<td>30B</td>
<td style="background-color: #fff3cd;text-align: center;">25.2</td>
<td style="background-color: #fff3cd;text-align: center;">44.8</td>
<td style="background-color: #fff3cd;text-align: center;">11.8</td>
<td style="background-color: #fff3cd;text-align: center;">24.4</td>
<td style="background-color: #d4edda;text-align: center;">25.7</td>
<td style="background-color: #d4edda;text-align: center;">44.2</td>
<td style="background-color: #d4edda;text-align: center;">14.2</td>
<td style="background-color: #d4edda;text-align: center;">26.5</td>
</tr>
<tr>
<td><a href="https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct">Qwen3-Coder-30B-A3B-Instruct</a></td>
<td>30B</td>
<td style="background-color: #fff3cd;text-align: center;"><strong>51.9</strong></td>
<td style="background-color: #fff3cd;text-align: center;"><strong>67.3</strong></td>
<td style="background-color: #fff3cd;text-align: center;"><strong>28.7</strong></td>
<td style="background-color: #fff3cd;text-align: center;"><strong>42.8</strong></td>
<td style="background-color: #d4edda;text-align: center;"><strong>50.0</strong></td>
<td style="background-color: #d4edda;text-align: center;">63.0</td>
<td style="background-color: #d4edda;text-align: center;"><strong>28.1</strong></td>
<td style="background-color: #d4edda;text-align: center;"><strong>38.7</strong></td>
</tr>
<tr style="background-color: #ebeced">
<td style="color: black;">nebius/SWE-rebench-openhands-Qwen3-30B-A3B (Ours)</td>
<td>30B</td>
<td style="background-color: #ffdf80;text-align: center;">49.7<br/>(+24.5)</td>
<td style="background-color: #ffdf80;text-align: center;">65.4<br/>(+20.6)</td>
<td style="background-color: #ffdf80;text-align: center;">28.1<br/>(+16.3)</td>
<td style="background-color: #ffdf80;text-align: center;">38.7<br/>(+14.3)</td>
<td style="background-color: #9df2b3;text-align: center;"><strong>50.3</strong><br/>(+24.6)</td>
<td style="background-color: #9df2b3;text-align: center;"><strong>68.3</strong><br/>(+24.1)</td>
<td style="background-color: #9df2b3;text-align: center;"><strong>28.1</strong><br/>(+13.9)</td>
<td style="background-color: #9df2b3;text-align: center;"><strong>38.7</strong><br/>(+12.2)</td>
</tr>
<tr>
<td colspan="10"><strong>100B+ scale</strong></td>
</tr>
<tr>
<td><a href="https://huggingface.co/zai-org/GLM-4.5-Air">GLM-4.5-Air</a></td>
<td>106B</td>
<td style="background-color: #fff3cd;text-align: center;">58.2</td>
<td style="background-color: #fff3cd;text-align: center;">73.5</td>
<td style="background-color: #fff3cd;text-align: center;">33.8</td>
<td style="background-color: #fff3cd;text-align: center;">42.8</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
</tr>
<tr>
<td colspan="10"><strong>200B+ scale</strong></td>
</tr>
<tr>
<td><a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507">Qwen3-235B-A22B-Instruct-2507</a></td>
<td>235B</td>
<td style="background-color: #fff3cd;text-align: center;">45.2</td>
<td style="background-color: #fff3cd;text-align: center;">65.9</td>
<td style="background-color: #fff3cd;text-align: center;">29.3</td>
<td style="background-color: #fff3cd;text-align: center;">44.8</td>
<td style="background-color: #d4edda;text-align: center;">46.2</td>
<td style="background-color: #d4edda;text-align: center;">67.5</td>
<td style="background-color: #d4edda;text-align: center;">25.3</td>
<td style="background-color: #d4edda;text-align: center;">40.8</td>
</tr>
<tr>
<td style="color: black;"><a href="https://huggingface.co/nebius/SWE-rebench-openhands-Qwen3-235B-A22B">nebius/SWE-rebench-openhands-Qwen3-235B-A22B</a> (Ours)</td>
<td>235B</td>
<td style="background-color: #fff3cd;text-align: center;"><strong>59.9</strong><br/>(+14.7)</td>
<td style="background-color: #fff3cd;text-align: center;"><strong>73.9</strong><br/>(+8.0)</td>
<td style="background-color: #fff3cd;text-align: center;"><strong>35.1</strong><br/>(+5.8)</td>
<td style="background-color: #fff3cd;text-align: center;"><strong>46.9</strong><br/>(+2.1)</td>
<td style="background-color: #d4edda;text-align: center;"><strong>61.7</strong><br/>(+15.5)</td>
<td style="background-color: #d4edda;text-align: center;"><strong>74.3</strong><br/>(+6.8)</td>
<td style="background-color: #d4edda;text-align: center;"><strong>34.2</strong><br/>(+8.9)</td>
<td style="background-color: #d4edda;text-align: center;"><strong>44.8</strong><br/>(+4.0)</td>
</tr>
<tr>
<td colspan="10"><strong>300B+ scale</strong></td>
</tr>
<tr>
<td><a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a></td>
<td>355B</td>
<td style="background-color: #fff3cd;text-align: center;">64.4</td>
<td style="background-color: #fff3cd;text-align: center;">76.2</td>
<td style="background-color: #fff3cd;text-align: center;">33.8</td>
<td style="background-color: #fff3cd;text-align: center;">44.8</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
<td style="background-color: #d4edda;text-align: center;">-</td>
</tr>
<tr>
<td><a href="https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct">Qwen3-Coder-480B-A35B-Instruct</a></td>
<td>480B</td>
<td style="background-color: #fff3cd;text-align: center;">64.7</td>
<td style="background-color: #fff3cd;text-align: center;">75.8</td>
<td style="background-color: #fff3cd;text-align: center;">36.3</td>
<td style="background-color: #fff3cd;text-align: center;">44.8</td>
<td style="background-color: #d4edda;text-align: center;">66.5</td>
<td style="background-color: #d4edda;text-align: center;">77.8</td>
<td style="background-color: #d4edda;text-align: center;">35.5</td>
<td style="background-color: #d4edda;text-align: center;">42.8</td>
</tr>
</tbody>
</table>
**Table 1.** Pass@1 (averaged over 5 runs) and Pass@5 for OpenHands agent with the maximum number of turns set to 100
(highlighted in <span style="background-color: #fff3cd; padding: 4px;">yellow</span>) and 500
(highlighted in <span style="background-color: #d4edda; padding: 4px;">green</span>). Metrics are reported in percentages.
Deltas vs base models are shown in parentheses for fine-tuned models.
We explicitly excluded all [SWE-bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) and
[SWE-rebench September](https://huggingface.co/datasets/nebius/SWE-rebench-leaderboard) issues from training to avoid contamination.
SWE-rebench Verified was additionally decontaminated on repository level.
When evaluated with the OpenHands (v0.54.0) agent, our 30B model:
* Substantially improves over the base Qwen3-30B-A3B-Instruct-2507 model, with **+24.6 Pass@1** and **+24.1 Pass@5** gains at 500-turn settings.
* Matches or surpasses the specialized Qwen3-Coder-30B-A3B-Instruct baseline at 500 turns on SWE-bench Verified (**50.3%** vs **50.0% Pass@1**; **68.3%** vs **63.0% Pass@5**).
* Generalizes to longer interaction horizons, despite training on trajectories capped at 100 turns.
For more details see our report in [Nebius blog](https://nebius.com/blog/posts/openhands-trajectories-with-qwen3-coder-480b).
---
# Best Practices
1. **Deployment:**
* Use the following configuration to serve the model with vLLM:
```bash
VLLM_USE_V1=1 vllm serve nebius/SWE-rebench-openhands-Qwen3-30B-A3B
--tensor-parallel-size 8
--served-model-name qwen_3_instruct_2507
--disable-log-requests
--enable-prefix-caching
--max-model-len 131072
--enable-auto-tool-choice
--tool-call-parser hermes
```
Tested using `vllm/vllm-openai:v0.9.0` Docker image.
2. **Sampling Parameters:**
* For optimal performance, we recommend `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`
that are consistent with the base model.
---
# Citation
```
@article{trofimova2025openhandstrajs,
title={OpenHands Trajectories with Qwen3-Coder-480B-A35B-Instruct},
author={Trofimova, Maria and Shevtsov, Anton and Ibragim, Badertdinov and Pyaev, Konstantin and Karasik, Simon and Golubev, Alexander},
year={2025},
journal={Nebius blog},
note={}
}
```