vim-ary commited on
Commit
faefe64
·
verified ·
1 Parent(s): 140565c

feat: improve readibility & add best practices section

Browse files
Files changed (1) hide show
  1. README.md +60 -38
README.md CHANGED
@@ -27,14 +27,14 @@ Training used a maximum sequence length of 131k tokens.
27
  <th colspan="4" style="background-color: #d4edda;">Maximum Number of Turns = 500</th>
28
  </tr>
29
  <tr>
30
- <th style="background-color: #fff3cd;">Pass@1, %</th>
31
- <th style="background-color: #fff3cd;">Pass@5, %</th>
32
- <th style="background-color: #fff3cd;">Pass@1, %</th>
33
- <th style="background-color: #fff3cd;">Pass@5, %</th>
34
- <th style="background-color: #d4edda;">Pass@1, %</th>
35
- <th style="background-color: #d4edda;">Pass@5, %</th>
36
- <th style="background-color: #d4edda;">Pass@1, %</th>
37
- <th style="background-color: #d4edda;">Pass@5, %</th>
38
  </tr>
39
  </thead>
40
  <tbody>
@@ -44,37 +44,37 @@ Training used a maximum sequence length of 131k tokens.
44
  <tr>
45
  <td><a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507">Qwen3-30B-A3B-Instruct-2507</a></td>
46
  <td>30B</td>
47
- <td style="background-color: #fff3cd;text-align: center;">25.2 ± 0.7</td>
48
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
49
- <td style="background-color: #fff3cd;text-align: center;">11.8 ± 1.5</td>
50
  <td style="background-color: #fff3cd;text-align: center;">24.4</td>
51
- <td style="background-color: #d4edda;text-align: center;">25.7 ± 0.5</td>
52
  <td style="background-color: #d4edda;text-align: center;">44.2</td>
53
- <td style="background-color: #d4edda;text-align: center;">14.2 ± 1.1</td>
54
  <td style="background-color: #d4edda;text-align: center;">26.5</td>
55
  </tr>
56
  <tr>
57
  <td><a href="https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct">Qwen3-Coder-30B-A3B-Instruct</a></td>
58
  <td>30B</td>
59
- <td style="background-color: #fff3cd;text-align: center;"><strong>51.9</strong> ± 0.2</td>
60
  <td style="background-color: #fff3cd;text-align: center;"><strong>67.3</strong></td>
61
- <td style="background-color: #fff3cd;text-align: center;"><strong>28.7</strong> ± 1.1</td>
62
  <td style="background-color: #fff3cd;text-align: center;"><strong>42.8</strong></td>
63
- <td style="background-color: #d4edda;text-align: center;"><strong>50.0</strong> ± 0.5</td>
64
  <td style="background-color: #d4edda;text-align: center;">63.0</td>
65
- <td style="background-color: #d4edda;text-align: center;"><strong>28.1</strong> ± 1.5</td>
66
  <td style="background-color: #d4edda;text-align: center;"><strong>38.7</strong></td>
67
  </tr>
68
  <tr style="background-color: #ebeced">
69
  <td style="color: black;">nebius/SWE-rebench-openhands-Qwen3-30B-A3B (Ours)</td>
70
  <td>30B</td>
71
- <td style="background-color: #ffdf80;text-align: center;">49.7 ± 0.9<br/>(+24.5)</td>
72
  <td style="background-color: #ffdf80;text-align: center;">65.4<br/>(+20.6)</td>
73
- <td style="background-color: #ffdf80;text-align: center;">28.1 ± 1.5<br/>(+16.3)</td>
74
  <td style="background-color: #ffdf80;text-align: center;">38.7<br/>(+14.3)</td>
75
- <td style="background-color: #9df2b3;text-align: center;"><strong>50.3</strong> ± 0.7<br/>(+24.6)</td>
76
  <td style="background-color: #9df2b3;text-align: center;"><strong>68.3</strong><br/>(+24.1)</td>
77
- <td style="background-color: #9df2b3;text-align: center;"><strong>28.1</strong> ± 1.0<br/>(+13.9)</td>
78
  <td style="background-color: #9df2b3;text-align: center;"><strong>38.7</strong><br/>(+12.2)</td>
79
  </tr>
80
  <tr>
@@ -83,9 +83,9 @@ Training used a maximum sequence length of 131k tokens.
83
  <tr>
84
  <td><a href="https://huggingface.co/zai-org/GLM-4.5-Air">GLM-4.5-Air</a></td>
85
  <td>106B</td>
86
- <td style="background-color: #fff3cd;text-align: center;">58.2 ± 0.2</td>
87
  <td style="background-color: #fff3cd;text-align: center;">73.5</td>
88
- <td style="background-color: #fff3cd;text-align: center;">33.8 ± 1.2</td>
89
  <td style="background-color: #fff3cd;text-align: center;">42.8</td>
90
  <td style="background-color: #d4edda;text-align: center;">-</td>
91
  <td style="background-color: #d4edda;text-align: center;">-</td>
@@ -98,25 +98,25 @@ Training used a maximum sequence length of 131k tokens.
98
  <tr>
99
  <td><a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507">Qwen3-235B-A22B-Instruct-2507</a></td>
100
  <td>235B</td>
101
- <td style="background-color: #fff3cd;text-align: center;">45.2 ± 0.8</td>
102
  <td style="background-color: #fff3cd;text-align: center;">65.9</td>
103
- <td style="background-color: #fff3cd;text-align: center;">29.3 ± 2.4</td>
104
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
105
- <td style="background-color: #d4edda;text-align: center;">46.2 ± 0.4</td>
106
  <td style="background-color: #d4edda;text-align: center;">67.5</td>
107
- <td style="background-color: #d4edda;text-align: center;">25.3 ± 1.9</td>
108
  <td style="background-color: #d4edda;text-align: center;">40.8</td>
109
  </tr>
110
  <tr>
111
  <td style="color: black;"><a href="https://huggingface.co/nebius/SWE-rebench-openhands-Qwen3-235B-A22B">nebius/SWE-rebench-openhands-Qwen3-235B-A22B</a> (Ours)</td>
112
  <td>235B</td>
113
- <td style="background-color: #fff3cd;text-align: center;"><strong>59.9</strong> ± 0.1<br/>(+14.7)</td>
114
  <td style="background-color: #fff3cd;text-align: center;"><strong>73.9</strong><br/>(+8.0)</td>
115
- <td style="background-color: #fff3cd;text-align: center;"><strong>35.1</strong> ± 1.0<br/>(+5.8)</td>
116
  <td style="background-color: #fff3cd;text-align: center;"><strong>46.9</strong><br/>(+2.1)</td>
117
- <td style="background-color: #d4edda;text-align: center;"><strong>61.7</strong> ± 0.9<br/>(+15.5)</td>
118
  <td style="background-color: #d4edda;text-align: center;"><strong>74.3</strong><br/>(+6.8)</td>
119
- <td style="background-color: #d4edda;text-align: center;"><strong>34.2</strong> ± 1.5<br/>(+8.9)</td>
120
  <td style="background-color: #d4edda;text-align: center;"><strong>44.8</strong><br/>(+4.0)</td>
121
  </tr>
122
  <tr>
@@ -125,9 +125,9 @@ Training used a maximum sequence length of 131k tokens.
125
  <tr>
126
  <td><a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a></td>
127
  <td>355B</td>
128
- <td style="background-color: #fff3cd;text-align: center;">64.4 ± 0.5</td>
129
  <td style="background-color: #fff3cd;text-align: center;">76.2</td>
130
- <td style="background-color: #fff3cd;text-align: center;">33.8 ± 1.7</td>
131
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
132
  <td style="background-color: #d4edda;text-align: center;">-</td>
133
  <td style="background-color: #d4edda;text-align: center;">-</td>
@@ -137,21 +137,21 @@ Training used a maximum sequence length of 131k tokens.
137
  <tr>
138
  <td><a href="https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct">Qwen3-Coder-480B-A35B-Instruct</a></td>
139
  <td>480B</td>
140
- <td style="background-color: #fff3cd;text-align: center;">64.7 ± 0.5</td>
141
  <td style="background-color: #fff3cd;text-align: center;">75.8</td>
142
- <td style="background-color: #fff3cd;text-align: center;">36.3 ± 1.6</td>
143
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
144
- <td style="background-color: #d4edda;text-align: center;">66.5 ± 0.4</td>
145
  <td style="background-color: #d4edda;text-align: center;">77.8</td>
146
- <td style="background-color: #d4edda;text-align: center;">35.5 ± 1.4</td>
147
  <td style="background-color: #d4edda;text-align: center;">42.8</td>
148
  </tr>
149
  </tbody>
150
  </table>
151
 
152
- **Table 1.** Pass@1 with standard error of the mean and Pass@5 for OpenHands agent with the maximum number of turns set to 100
153
  (highlighted in <span style="background-color: #fff3cd; padding: 4px;">yellow</span>) and 500
154
- (highlighted in <span style="background-color: #d4edda; padding: 4px;">green</span>).
155
  Deltas vs base models are shown in parentheses for fine-tuned models.
156
 
157
  We explicitly excluded all [SWE-bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) and
@@ -168,6 +168,28 @@ For more details see our report in [Nebius blog](LINK-TO-BE-ADDED).
168
 
169
  ---
170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  # Citation
172
 
173
  ```
 
27
  <th colspan="4" style="background-color: #d4edda;">Maximum Number of Turns = 500</th>
28
  </tr>
29
  <tr>
30
+ <th style="background-color: #fff3cd;">Pass@1</th>
31
+ <th style="background-color: #fff3cd;">Pass@5</th>
32
+ <th style="background-color: #fff3cd;">Pass@1</th>
33
+ <th style="background-color: #fff3cd;">Pass@5</th>
34
+ <th style="background-color: #d4edda;">Pass@1</th>
35
+ <th style="background-color: #d4edda;">Pass@5</th>
36
+ <th style="background-color: #d4edda;">Pass@1</th>
37
+ <th style="background-color: #d4edda;">Pass@5</th>
38
  </tr>
39
  </thead>
40
  <tbody>
 
44
  <tr>
45
  <td><a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507">Qwen3-30B-A3B-Instruct-2507</a></td>
46
  <td>30B</td>
47
+ <td style="background-color: #fff3cd;text-align: center;">25.2</td>
48
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
49
+ <td style="background-color: #fff3cd;text-align: center;">11.8</td>
50
  <td style="background-color: #fff3cd;text-align: center;">24.4</td>
51
+ <td style="background-color: #d4edda;text-align: center;">25.7</td>
52
  <td style="background-color: #d4edda;text-align: center;">44.2</td>
53
+ <td style="background-color: #d4edda;text-align: center;">14.2</td>
54
  <td style="background-color: #d4edda;text-align: center;">26.5</td>
55
  </tr>
56
  <tr>
57
  <td><a href="https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct">Qwen3-Coder-30B-A3B-Instruct</a></td>
58
  <td>30B</td>
59
+ <td style="background-color: #fff3cd;text-align: center;"><strong>51.9</strong></td>
60
  <td style="background-color: #fff3cd;text-align: center;"><strong>67.3</strong></td>
61
+ <td style="background-color: #fff3cd;text-align: center;"><strong>28.7</strong></td>
62
  <td style="background-color: #fff3cd;text-align: center;"><strong>42.8</strong></td>
63
+ <td style="background-color: #d4edda;text-align: center;"><strong>50.0</strong></td>
64
  <td style="background-color: #d4edda;text-align: center;">63.0</td>
65
+ <td style="background-color: #d4edda;text-align: center;"><strong>28.1</strong></td>
66
  <td style="background-color: #d4edda;text-align: center;"><strong>38.7</strong></td>
67
  </tr>
68
  <tr style="background-color: #ebeced">
69
  <td style="color: black;">nebius/SWE-rebench-openhands-Qwen3-30B-A3B (Ours)</td>
70
  <td>30B</td>
71
+ <td style="background-color: #ffdf80;text-align: center;">49.7<br/>(+24.5)</td>
72
  <td style="background-color: #ffdf80;text-align: center;">65.4<br/>(+20.6)</td>
73
+ <td style="background-color: #ffdf80;text-align: center;">28.1<br/>(+16.3)</td>
74
  <td style="background-color: #ffdf80;text-align: center;">38.7<br/>(+14.3)</td>
75
+ <td style="background-color: #9df2b3;text-align: center;"><strong>50.3</strong><br/>(+24.6)</td>
76
  <td style="background-color: #9df2b3;text-align: center;"><strong>68.3</strong><br/>(+24.1)</td>
77
+ <td style="background-color: #9df2b3;text-align: center;"><strong>28.1</strong><br/>(+13.9)</td>
78
  <td style="background-color: #9df2b3;text-align: center;"><strong>38.7</strong><br/>(+12.2)</td>
79
  </tr>
80
  <tr>
 
83
  <tr>
84
  <td><a href="https://huggingface.co/zai-org/GLM-4.5-Air">GLM-4.5-Air</a></td>
85
  <td>106B</td>
86
+ <td style="background-color: #fff3cd;text-align: center;">58.2</td>
87
  <td style="background-color: #fff3cd;text-align: center;">73.5</td>
88
+ <td style="background-color: #fff3cd;text-align: center;">33.8</td>
89
  <td style="background-color: #fff3cd;text-align: center;">42.8</td>
90
  <td style="background-color: #d4edda;text-align: center;">-</td>
91
  <td style="background-color: #d4edda;text-align: center;">-</td>
 
98
  <tr>
99
  <td><a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507">Qwen3-235B-A22B-Instruct-2507</a></td>
100
  <td>235B</td>
101
+ <td style="background-color: #fff3cd;text-align: center;">45.2</td>
102
  <td style="background-color: #fff3cd;text-align: center;">65.9</td>
103
+ <td style="background-color: #fff3cd;text-align: center;">29.3</td>
104
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
105
+ <td style="background-color: #d4edda;text-align: center;">46.2</td>
106
  <td style="background-color: #d4edda;text-align: center;">67.5</td>
107
+ <td style="background-color: #d4edda;text-align: center;">25.3</td>
108
  <td style="background-color: #d4edda;text-align: center;">40.8</td>
109
  </tr>
110
  <tr>
111
  <td style="color: black;"><a href="https://huggingface.co/nebius/SWE-rebench-openhands-Qwen3-235B-A22B">nebius/SWE-rebench-openhands-Qwen3-235B-A22B</a> (Ours)</td>
112
  <td>235B</td>
113
+ <td style="background-color: #fff3cd;text-align: center;"><strong>59.9</strong><br/>(+14.7)</td>
114
  <td style="background-color: #fff3cd;text-align: center;"><strong>73.9</strong><br/>(+8.0)</td>
115
+ <td style="background-color: #fff3cd;text-align: center;"><strong>35.1</strong><br/>(+5.8)</td>
116
  <td style="background-color: #fff3cd;text-align: center;"><strong>46.9</strong><br/>(+2.1)</td>
117
+ <td style="background-color: #d4edda;text-align: center;"><strong>61.7</strong><br/>(+15.5)</td>
118
  <td style="background-color: #d4edda;text-align: center;"><strong>74.3</strong><br/>(+6.8)</td>
119
+ <td style="background-color: #d4edda;text-align: center;"><strong>34.2</strong><br/>(+8.9)</td>
120
  <td style="background-color: #d4edda;text-align: center;"><strong>44.8</strong><br/>(+4.0)</td>
121
  </tr>
122
  <tr>
 
125
  <tr>
126
  <td><a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a></td>
127
  <td>355B</td>
128
+ <td style="background-color: #fff3cd;text-align: center;">64.4</td>
129
  <td style="background-color: #fff3cd;text-align: center;">76.2</td>
130
+ <td style="background-color: #fff3cd;text-align: center;">33.8</td>
131
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
132
  <td style="background-color: #d4edda;text-align: center;">-</td>
133
  <td style="background-color: #d4edda;text-align: center;">-</td>
 
137
  <tr>
138
  <td><a href="https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct">Qwen3-Coder-480B-A35B-Instruct</a></td>
139
  <td>480B</td>
140
+ <td style="background-color: #fff3cd;text-align: center;">64.7</td>
141
  <td style="background-color: #fff3cd;text-align: center;">75.8</td>
142
+ <td style="background-color: #fff3cd;text-align: center;">36.3</td>
143
  <td style="background-color: #fff3cd;text-align: center;">44.8</td>
144
+ <td style="background-color: #d4edda;text-align: center;">66.5</td>
145
  <td style="background-color: #d4edda;text-align: center;">77.8</td>
146
+ <td style="background-color: #d4edda;text-align: center;">35.5</td>
147
  <td style="background-color: #d4edda;text-align: center;">42.8</td>
148
  </tr>
149
  </tbody>
150
  </table>
151
 
152
+ **Table 1.** Pass@1 (averaged over 5 runs) and Pass@5 for OpenHands agent with the maximum number of turns set to 100
153
  (highlighted in <span style="background-color: #fff3cd; padding: 4px;">yellow</span>) and 500
154
+ (highlighted in <span style="background-color: #d4edda; padding: 4px;">green</span>). Metrics are reported in percentages.
155
  Deltas vs base models are shown in parentheses for fine-tuned models.
156
 
157
  We explicitly excluded all [SWE-bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) and
 
168
 
169
  ---
170
 
171
+ # Best Practices
172
+
173
+ 1. **Deployment:**
174
+ * Use the following configuration to serve the model with vLLM:
175
+ ```bash
176
+ VLLM_USE_V1=1 vllm serve nebius/SWE-rebench-openhands-Qwen3-30B-A3B
177
+ --tensor-parallel-size 8
178
+ --served-model-name qwen_3_instruct_2507
179
+ --disable-log-requests
180
+ --enable-prefix-caching
181
+ --max-model-len 131072
182
+ --enable-auto-tool-choice
183
+ --tool-call-parser hermes
184
+ ```
185
+ Tested using `vllm/vllm-openai:v0.9.0` Docker image.
186
+
187
+ 2. **Sampling Parameters:**
188
+ * For optimal performance, we recommend `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`
189
+ that are consistent with the base model.
190
+
191
+ ---
192
+
193
  # Citation
194
 
195
  ```