Model Summary
SWE-rebench-openhands-Qwen3-235B-A22B is a 235B Rejection Sampling Fine-Tuning (RFT) checkpoint derived from Qwen/Qwen3-235B-A22B-Instruct-2507, trained on the newly released nebius/SWE-rebench-openhands-trajectories dataset. Training used a maximum sequence length of 131k tokens.
| Model | Size | Maximum Number of Turns = 100 | Maximum Number of Turns = 500 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Pass@1 | Pass@5 | Pass@1 | Pass@5 | Pass@1 | Pass@5 | Pass@1 | Pass@5 | ||
| 30B scale | |||||||||
| Qwen3-30B-A3B-Instruct-2507 | 30B | 25.2 | 44.8 | 11.8 | 24.4 | 25.7 | 44.2 | 14.2 | 26.5 |
| Qwen3-Coder-30B-A3B-Instruct | 30B | 51.9 | 67.3 | 28.7 | 42.8 | 50.0 | 63.0 | 28.1 | 38.7 |
| nebius/SWE-rebench-openhands-Qwen3-30B-A3B (Ours) | 30B | 49.7 (+24.5) |
65.4 (+20.6) |
28.1 (+16.3) |
38.7 (+14.3) |
50.3 (+24.6) |
68.3 (+24.1) |
28.1 (+13.9) |
38.7 (+12.2) |
| 100B+ scale | |||||||||
| GLM-4.5-Air | 106B | 58.2 | 73.5 | 33.8 | 42.8 | - | - | - | - |
| 200B+ scale | |||||||||
| Qwen3-235B-A22B-Instruct-2507 | 235B | 45.2 | 65.9 | 29.3 | 44.8 | 46.2 | 67.5 | 25.3 | 40.8 |
| nebius/SWE-rebench-openhands-Qwen3-235B-A22B (Ours) | 235B | 59.9 (+14.7) |
73.9 (+8.0) |
35.1 (+5.8) |
46.9 (+2.1) |
61.7 (+15.5) |
74.3 (+6.8) |
34.2 (+8.9) |
44.8 (+4.0) |
| 300B+ scale | |||||||||
| GLM-4.5 | 355B | 64.4 | 76.2 | 33.8 | 44.8 | - | - | - | - |
| Qwen3-Coder-480B-A35B-Instruct | 480B | 64.7 | 75.8 | 36.3 | 44.8 | 66.5 | 77.8 | 35.5 | 42.8 |
Table 1. Pass@1 (averaged over 5 runs) and Pass@5 for OpenHands agent with the maximum number of turns set to 100 (highlighted in yellow) and 500 (highlighted in green). Metrics are reported in percentages. Deltas vs base models are shown in parentheses for fine-tuned models.
We explicitly excluded all SWE-bench Verified and SWE-rebench September issues from training to avoid contamination. SWE-rebench Verified was additionally decontaminated on repository level.
When evaluated with the OpenHands (v0.54.0) agent, our 235B model:
- Achieves 61.7% Pass@1 and 74.3% Pass@5 at 500-turn settings — outperforming the 30B coding specialist model (Qwen3-Coder-30B-A3B-Instruct) while using half the parameters of Qwen3-Coder-480B-A35B-Instruct.
- Delivers strong improvements over its base model, with +15.5 Pass@1 and +6.8 Pass@5 gains at 500 turns.
- Maintains competitive performance under both 100-turn and 500-turn configurations, despite training on trajectories capped at 100 turns.
For more details see our report in Nebius blog.
Best Practices
Deployment:
- Use the following configuration to serve the model with vLLM:
Tested usingVLLM_USE_V1=1 vllm serve nebius/SWE-rebench-openhands-Qwen3-235B-A22B --tensor-parallel-size 8 --served-model-name qwen_3_instruct_2507 --disable-log-requests --enable-prefix-caching --max-model-len 131072 --enable-auto-tool-choice --tool-call-parser hermesvllm/vllm-openai:v0.9.0Docker image.
- Use the following configuration to serve the model with vLLM:
Sampling Parameters:
- For optimal performance, we recommend
Temperature=0.7,TopP=0.8,TopK=20, andMinP=0that are consistent with the base model.
- For optimal performance, we recommend
Citation
@article{trofimova2025openhandstrajs,
title={OpenHands Trajectories with Qwen3-Coder-480B-A35B-Instruct},
author={Trofimova, Maria and Shevtsov, Anton and Ibragim, Badertdinov and Pyaev, Konstantin and Karasik, Simon and Golubev, Alexander},
year={2025},
journal={Nebius blog},
note={}
}
- Downloads last month
- 8
Model tree for nebius/SWE-rebench-openhands-Qwen3-235B-A22B
Base model
Qwen/Qwen3-235B-A22B-Instruct-2507