Update README.md
Browse files
README.md
CHANGED
|
@@ -66,7 +66,7 @@ This approach leverages **verifiable rewards** to enhance reasoning capability a
|
|
| 66 |
# <span id="Performance">3. Model Performance</span>
|
| 67 |
For model performance comparison, we benchmark our model against recent reasoning LLMs from the Qwen3 series.
|
| 68 |
All models are evaluated under identical configurations to ensure fairness.
|
| 69 |
-
The results show that our model outperforms the baselines across a range of mainstream benchmarks, including math, science, creative writing, tool use, and human preference alignment
|
| 70 |
|
| 71 |
| Model | AIME24 | AIME25 | GPQA | Super-GPQA | Science-QA | Writing-Bench | BFCL-V4-Agentic | Arena-hard2 |
|
| 72 |
|----------------|--------|--------|------|------------|------------|--------------|----------------|-------------|
|
|
|
|
| 66 |
# <span id="Performance">3. Model Performance</span>
|
| 67 |
For model performance comparison, we benchmark our model against recent reasoning LLMs from the Qwen3 series.
|
| 68 |
All models are evaluated under identical configurations to ensure fairness.
|
| 69 |
+
The results show that our model outperforms the baselines across a range of mainstream benchmarks, including **math, science, creative writing, tool use, and human preference alignment**.
|
| 70 |
|
| 71 |
| Model | AIME24 | AIME25 | GPQA | Super-GPQA | Science-QA | Writing-Bench | BFCL-V4-Agentic | Arena-hard2 |
|
| 72 |
|----------------|--------|--------|------|------------|------------|--------------|----------------|-------------|
|