Nemotron-Cascade-8B Intermediate ckpts
Introduction
This repository releases the intermediate checkpoints produced during the development of Nemotron-Cascade-8B. Nemotron-Cascade-8B is a general-purpose model trained using a sequential, domain-wise reinforcement learning pipeline, illustrated in the figure below.
We release checkpoints corresponding to each major stage of training:
- Nemotron-Cascade-8B-SFT (completed multi-stage SFT)
- Nemotron-Cascade-8B-RLHF (completed RLHF)
- Nemotron-Cascade-8B-IFRL (completed instruction following RL)
- Nemotron-Cascade-8B-MathRL (completed Math RL)
- Nemotron-Cascade-8B-CodeRL (completed Code RL)
The final model, Nemotron-Cascade-8B, is obtained after the concluding SWE RL stage.
Usage Recommendations
We recommend using RoPE scaling with the YaRN method to better support contexts longer than 32K. This can be enabled by updating the model’s config.json as shown below:
{
...,
"rope_scaling": {
"rope_type": "yarn",
"factor": 2.0,
"original_max_position_embeddings": 32768
}
}
Results
Same as Nemotron-Cascade-8B, we use a maximum output length of 64K tokens for evaluation, with the temperature set to 0.6 and top-p to 0.95. We also apply RoPE scaling using the YaRN method with a scaling factor of 2.0.
| Benchmark Metric: Pass@1 |
Nemotron- Cascade-8B-SFT |
Nemotron- Cascade-8B-RLHF |
Nemotron- Cascade-8B-IFRL |
Nemotron- Cascade-8B-MathRL |
Nemotron- Cascade-8B-CodeRL |
Nemotron- Cascade-8B |
|---|---|---|---|---|---|---|
| Knowledge Reasoning | ||||||
| MMLU | 83.0 | 83.1 | 83.4 | 83.4 | 83.7 | 83.7 |
| MMLU Pro | 74.4 | 77.8 | 74.5 | 75.0 | 75.3 | 75.7 |
| GPQA-Diamond | 63.5 | 66.8 | 66.1 | 65.7 | 67.4 | 66.5 |
| Alignment | ||||||
| ArenaHard | 70.0 | 90.1 | 88.0 | 87.0 | 87.8 | 87.9 |
| IFEval (Strict Prompt) | 70.8 | 50.1 | 90.4 | 92.1 | 90.7 | 90.2 |
| IFBench | 21.2 | 24.5 | 40.5 | 40.4 | 38.1 | 40.8 |
| Math | ||||||
| AIME 2024 | 83.6 | 86.1 | 86.2 | 90.2 | 89.1 | 89.5 |
| AIME 2025 | 72.8 | 75.0 | 75.2 | 81.9 | 80.5 | 80.1 |
| Code | ||||||
| LCB v5 (08/24-02/25) | 59.2 | 70.2 | 70.2 | 70.6 | 75.3 | 74.3 |
| LCB v6 (08/24-05/25) | 56.7 | 67.2 | 66.7 | 67.4 | 71.5 | 71.1 |
| SWE Verified (Agentless) | 26.1 | 28.2 | 28.3 | 30.6 | 31.6 | 37.2 |
Chat Template
All intermediate checkpoints use the same chat template as Nemotron-Cascade-8B. Each is a unified model supporting both thinking and instruct (non-reasoning) modes. To switch between these two modes, simply append the " /think" (for thinking) or the " /no_think" (for instruct) tag to the end of the user input. See Nemotron-Cascade-8B for additional details.
Release Date
Dec 19, 2025
License
Your use of this model is governed by the NVIDIA Open Model License.
Citation
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2025}
}