|
|
--- |
|
|
library_name: transformers |
|
|
license: other |
|
|
license_name: nvidia-open-model-license |
|
|
license_link: >- |
|
|
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- nvidia |
|
|
- nemotron-cascade |
|
|
- reasoning |
|
|
- general-purpose |
|
|
- SFT |
|
|
- RL |
|
|
- pytorch |
|
|
base_model: nvidia/Nemotron-Cascade-14B-Thinking |
|
|
--- |
|
|
|
|
|
|
|
|
# Nemotron-Cascade-14B-Thinking |
|
|
|
|
|
<p align="center"> |
|
|
|
|
|
[](https://arxiv.org/abs/2512.13607) |
|
|
[](https://huggingface.co/collections/nvidia/nemotron-cascade) |
|
|
[](https://huggingface.co/collections/nvidia/nemotron-cascade) |
|
|
[](https://huggingface.co/collections/nvidia/nemotron-cascade) |
|
|
</p> |
|
|
|
|
|
<img src="fig/nemotron-cascade-14b-thinking-results.png" alt="main_fig" style="width: 1000px; max-width: 100%;" /> |
|
|
|
|
|
|
|
|
## Introduction |
|
|
|
|
|
We're excited to introduce [Nemotron-Cascade-14B-Thinking](https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking), a powerful general-purpose model trained through sequential and domain-wise reinforcement learning. Nemotron-Cascade-14B-Thinking is post-trained from the [Qwen3-14B Base](https://huggingface.co/Qwen/Qwen3-14B-Base) model, and it achieves best-in-class performance across a wide range of benchmarks. Different from [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), Nemotron-Cascade-14B-Thinking is designed exclusively for the ***thinking*** mode. |
|
|
|
|
|
|
|
|
## Training Pipeline |
|
|
<img src="fig/pipeline.png" alt="train_pipeline_fig" style="width: 1000px; max-width: 100%;" /> |
|
|
|
|
|
The training pipeline for Nemotron-Cascade begins with a multi-stage SFT phase to equip the model with foundational skills. Subsequently, Cascade RL is applied across multiple domains to further enhance the model’s performance in these areas. |
|
|
|
|
|
Notably, RLHF for alignment, when used as a pre-step, boosts the model’s complex reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in the following Figure). |
|
|
|
|
|
<figure style="margin: 0; padding: 0;"> |
|
|
<img src="fig/lcb_through_cascade_rl.png" alt="lcb_through_cascade_rl_fig" style="width: 100%; max-width: 100%; margin: 0; padding: 0;"> |
|
|
<figcaption>The LiveCodeBench v6 (08/24–05/25) performance of the Nemotron-Cascade-14B-Thinking model throughout the Cascade RL process.</figcaption> |
|
|
</figure> |
|
|
|
|
|
|
|
|
## Results |
|
|
|
|
|
- We evaluate our model against competitive reasoning models on a diverse set of benchmarks, covering general-knowledge reasoning, alignment and instruction following, mathematical reasoning, competitive programming, software engineering, and tool-use proficiency. |
|
|
- For Nemotron-Cascade models, we use a maximum generation length of 64K tokens and set the temperature to 0.6 and top-p to 0.95 for reasoning tasks. |
|
|
- Our Nemotron-Cascade-14B-Thinking achieves best-in-class performance across almost all benchmarks. Remarkably, Nemotron-Cascade-14B-Thinking surpasses DeepSeek-R1-0528 (671B) by a clear margin across all LCB v5, v6, and Pro benchmarks. |
|
|
|
|
|
| **Benchmark<br>Metric: Pass@1** | **Qwen3-14B** | **DeepSeek-R1-0528 671B** | **Gemini-2.5-Flash-Thinking** | **Nemotron-Cascade-14B-Thinking** | |
|
|
| :---- | :---: | :---: | :---: | :---: | |
|
|
| ***Knowledge Reasoning*** | |
|
|
| MMLU | 84.9 | 89.9 | - | 85.1 | |
|
|
| MMLU Pro | 77.6 | 85.0 | 81.9 | 77.0 | |
|
|
| GPQA-Diamond | 64.0 | 81.0 | 82.8 | 69.6 | |
|
|
| ***Alignment*** | |
|
|
| ArenaHard | 91.7 | 95.1 | 95.7 | 89.5 | |
|
|
| IFEval (Strict Prompt) | 85.4 | 84.1 | 89.8 | 81.9 | |
|
|
| IFBench | 33.7 | 38.0 | 36.1 | 41.7 | |
|
|
| ***Math*** | |
|
|
| AIME 2024 | 79.3 | 91.4 | 82.3 | 89.7 | |
|
|
| AIME 2025 | 70.4 | 87.5 | 72.0 | 83.3 | |
|
|
| ***Code*** | |
|
|
| LCB v5 (08/24-02/25) | 65.2 | 74.8 | 63.4 | **77.5** | |
|
|
| LCB v6 (08/24-05/25) | 63.5 | 73.3 | 61.9 | **74.6** | |
|
|
| LCB Pro 25Q2 (Easy) | 53.6 | 63.9 | 47.4 | **68.9** | |
|
|
| LCB Pro 25Q2 (Med) | 2.6 | 7.0 | 1.8 | **10.5** | |
|
|
| SWE Verified (Agentless) | 27.4 | 57.6 | 48.9 | 43.1 | |
|
|
| ***Tool Calling*** | |
|
|
| BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 | |
|
|
|
|
|
## Evaluation Tookit |
|
|
|
|
|
To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md |
|
|
|
|
|
|
|
|
## Chat Template |
|
|
|
|
|
Nemotron-Cascade-14B-Thinking follows the Qwen3-style ChatML template and is designed exclusively for the ***thinking*** mode. To align with the template used in [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), the `" /think"` tag should be appended to the end of the user input. Note that a leading space is included in this tag to ensure correct tokenization. |
|
|
|
|
|
To reduce the context length in a multi-turn conversation, we include only the final summary of the model’s output in the conversation history and change the user turn’s `" /think"` tag to `" /no_think"`. |
|
|
|
|
|
A brief example is shown below: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
model_name = 'nvidia/Nemotron-Cascade-14B-Thinking' |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
''' |
|
|
single-turn example |
|
|
''' |
|
|
messages = [ |
|
|
{"role": "user", "content": "calculate 1+1?"} |
|
|
] |
|
|
|
|
|
# only thinking mode is supported (enable_thinking=True) |
|
|
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True) |
|
|
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /think<|im_end|>\n<|im_start|>assistant\n' |
|
|
|
|
|
|
|
|
''' |
|
|
multi-turn example |
|
|
''' |
|
|
messages = [ |
|
|
{"role": "user", "content": "calculate 1+1?"}, |
|
|
{"role": "assistant", "content": "<think>THINKING_CONTENT</think>\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**: \n \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed{2}\\)",}, |
|
|
{"role": "user", "content": "what about 2+2"} |
|
|
] |
|
|
|
|
|
# only thinking mode is supported (enable_thinking=True) |
|
|
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True) |
|
|
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /no_think<|im_end|>\n<|im_start|>assistant\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**: \n \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed\{2\}\\)<|im_end|>\n<|im_start|>user\nwhat about 2+2 /think<|im_end|>\n<|im_start|>assistant\n' |
|
|
``` |
|
|
|
|
|
|
|
|
## Release Date |
|
|
Dec 08, 2025 |
|
|
|
|
|
|
|
|
## License |
|
|
Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
|
|
|
|
|
|
|
|
## Citation |
|
|
``` |
|
|
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, |
|
|
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, |
|
|
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|