PCL-Reasoner-V1.5
Model Overview
We release PCL-Reasoner-V1.5, a next-generation reasoning model built upon PCL-Reasoner-V1 and further enhanced through offline reinforcement learning method on the vllm-ascend and MindSpeed-LLM framework with Ascend hardware acceleration. Building on the strong foundation of PCL-Reasoner-V1, PCL-Reasoner-V1.5 achieves even greater improvement in complex mathematical reasoning with long chains of thought (CoT), demonstrating state-of-the-art performance among 32B-scale models.
PCL-Reasoner-V1.5 attains 90.9% on AIME 2024 and 85.7% on AIME 2025, significantly outperforming prior 32B-class models and closing the gap with much larger systems. This advancement stems from refined data curation, improved contamination filtering, and optimized training dynamics tailored for deep reasoning tasks.
We have fully open-sourced the model weights, dataset, and training code to foster transparency, reproducibility, and community innovation. Follow the tutorial below to deploy, evaluate, or extend PCL-Reasoner-V1.5 in your own research!
Codes
Evaluation
All results are reported using the Avg@32 metric (average accuracy over 32 independent sampling attempts per problem), ensuring robust and fair comparison.
| Model Scale | Model | AIME 24 | AIME 25 |
|---|---|---|---|
| >100B | |||
| DeepSeek-R1 | 79.8 | 70 | |
| DeepSeek-R1-0528 | 91.4 | 87.5 | |
| Qwen3-235B-A22B | 85.7 | 81.5 | |
| OpenAI-o3 | 91.6 | 88.9 | |
| Gemini-2.5-Pro-0506 | 90.8 | 83 | |
| 32B | |||
| Qwen3-32B | 81.4 | 72.9 | |
| QwQ-32B | 79.5 | 69.5 | |
| DeepSeek-R1-Distill-Qwen-32B | 72.6 | 49.6 | |
| Skywork-OR1-32B | 82.2 | 73.3 | |
| AM-Thinking-v1 | 85.3 | 74.4 | |
| OpenReasoning-Nemotron-32B | 89.2 | 84.2 | |
| PCL-Reasoner-v1 | 85.7 |
84.2 |
|
| PCL-Reasoner-v1.5 | 90.9 |
85.7 |
|
Note: Model outputs on AIME24/25 are included in the repository under
eval_result/for verification and analysis.
Citation
@article{PCL-Reasoner-v1.5,
title={PCL-Reasoner-v1.5: A Math Problem Solver with Chain of Thought Reasoning},
author={Yao Lu, Deng Dong Fan, Jianzheng Nie, et al.},
journal={arXiv preprint arXiv:2405.14524},
year={2026}
}
Model tree for PCL-Reasoner/V1.5
Base model
Qwen/Qwen2.5-32BDataset used to train PCL-Reasoner/V1.5
Paper for PCL-Reasoner/V1.5
Evaluation results
- Aime24 on Aime24self-reported90.900
- Aime25 on Aime24self-reported85.600
