flust commited on
Commit
78d9793
·
verified ·
1 Parent(s): c905360

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -66,7 +66,7 @@ This approach leverages **verifiable rewards** to enhance reasoning capability a
66
  # <span id="Performance">3. Model Performance</span>
67
  For model performance comparison, we benchmark our model against recent reasoning LLMs from the Qwen3 series.
68
  All models are evaluated under identical configurations to ensure fairness.
69
- The results show that our model outperforms the baselines across a range of mainstream benchmarks, including math, science, creative writing, tool use, and human preference alignment.
70
 
71
  | Model | AIME24 | AIME25 | GPQA | Super-GPQA | Science-QA | Writing-Bench | BFCL-V4-Agentic | Arena-hard2 |
72
  |----------------|--------|--------|------|------------|------------|--------------|----------------|-------------|
 
66
  # <span id="Performance">3. Model Performance</span>
67
  For model performance comparison, we benchmark our model against recent reasoning LLMs from the Qwen3 series.
68
  All models are evaluated under identical configurations to ensure fairness.
69
+ The results show that our model outperforms the baselines across a range of mainstream benchmarks, including **math, science, creative writing, tool use, and human preference alignment**.
70
 
71
  | Model | AIME24 | AIME25 | GPQA | Super-GPQA | Science-QA | Writing-Bench | BFCL-V4-Agentic | Arena-hard2 |
72
  |----------------|--------|--------|------|------------|------------|--------------|----------------|-------------|