nvidia
/

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Text Generation

Model card Files Files and versions

suhara commited on 9 days ago

Commit

eab7377

·

verified ·

1 Parent(s): fdb60b2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -191,7 +191,7 @@ Stage 2: Supervised Fine-Tuning
 Stage 3: Reinforcement Learning
-* The model underwent multi-environment reinforcement learning using synchronous GRPO (Group Relative Policy Optimization) across math, code, science, instruction following, multi-step tool use, multi-turn conversations, and structured output environments. Conversational quality was further refined through RLHF using a [generative reward model](https://huggingface.co/nvidia/Qwen3-Nemotron-235B-A22B-GenRM). All datasets are disclosed in the *Training, Testing, and Evaluation Datasets* section of this document. The RL environments and datasets are released as part of [NeMo Gym](https://github.com/NVIDIA/NeMo-Gym).
 * Software used for reinforcement learning: [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)
 NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 model is a result of the above work.

 Stage 3: Reinforcement Learning
+* The model underwent multi-environment reinforcement learning using synchronous GRPO (Group Relative Policy Optimization) across math, code, science, instruction following, multi-step tool use, multi-turn conversations, and structured output environments. Conversational quality was further refined through RLHF using a [generative reward model](https://huggingface.co/nvidia/Qwen3-Nemotron-235B-A22B-GenRM). All datasets are disclosed in the *Training, Testing, and Evaluation Datasets* section of this document. The RL environments and datasets are released as part of [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym).
 * Software used for reinforcement learning: [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)
 NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 model is a result of the above work.