Nemotron-Cascade-8B Intermediate ckpts

Technical Report SFT Dataset RL Dataset Models

Introduction

This repository releases the intermediate checkpoints produced during the development of Nemotron-Cascade-8B. Nemotron-Cascade-8B is a general-purpose model trained using a sequential, domain-wise reinforcement learning pipeline, illustrated in the figure below.

train_pipeline_fig

We release checkpoints corresponding to each major stage of training:

  • Nemotron-Cascade-8B-SFT (completed multi-stage SFT)
  • Nemotron-Cascade-8B-RLHF (completed RLHF)
  • Nemotron-Cascade-8B-IFRL (completed instruction following RL)
  • Nemotron-Cascade-8B-MathRL (completed Math RL)
  • Nemotron-Cascade-8B-CodeRL (completed Code RL)

The final model, Nemotron-Cascade-8B, is obtained after the concluding SWE RL stage.

Usage Recommendations

We recommend using RoPE scaling with the YaRN method to better support contexts longer than 32K. This can be enabled by updating the model’s config.json as shown below:

  {
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 2.0,
        "original_max_position_embeddings": 32768
    }
  }

Results

Same as Nemotron-Cascade-8B, we use a maximum output length of 64K tokens for evaluation, with the temperature set to 0.6 and top-p to 0.95. We also apply RoPE scaling using the YaRN method with a scaling factor of 2.0.

Benchmark
Metric: Pass@1
Nemotron-
Cascade-8B-SFT
Nemotron-
Cascade-8B-RLHF
Nemotron-
Cascade-8B-IFRL
Nemotron-
Cascade-8B-MathRL
Nemotron-
Cascade-8B-CodeRL
Nemotron-
Cascade-8B
Knowledge Reasoning
MMLU 83.0 83.1 83.4 83.4 83.7 83.7
MMLU Pro 74.4 77.8 74.5 75.0 75.3 75.7
GPQA-Diamond 63.5 66.8 66.1 65.7 67.4 66.5
Alignment
ArenaHard 70.0 90.1 88.0 87.0 87.8 87.9
IFEval (Strict Prompt) 70.8 50.1 90.4 92.1 90.7 90.2
IFBench 21.2 24.5 40.5 40.4 38.1 40.8
Math
AIME 2024 83.6 86.1 86.2 90.2 89.1 89.5
AIME 2025 72.8 75.0 75.2 81.9 80.5 80.1
Code
LCB v5 (08/24-02/25) 59.2 70.2 70.2 70.6 75.3 74.3
LCB v6 (08/24-05/25) 56.7 67.2 66.7 67.4 71.5 71.1
SWE Verified (Agentless) 26.1 28.2 28.3 30.6 31.6 37.2

Chat Template

All intermediate checkpoints use the same chat template as Nemotron-Cascade-8B. Each is a unified model supporting both thinking and instruct (non-reasoning) modes. To switch between these two modes, simply append the " /think" (for thinking) or the " /no_think" (for instruct) tag to the end of the user input. See Nemotron-Cascade-8B for additional details.

Release Date

Dec 19, 2025

License

Your use of this model is governed by the NVIDIA Open Model License.

Citation

@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
  title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
  author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nvidia/Nemotron-Cascade-8B-Intermediate-ckpts