Nanbeige Logo

1. Introduction

Nanbeige4-3B-Base is a 3B-parameter base model within the fourth-generation Nanbeige LLM family. It showcases that even a compact model can achieve advanced performances through continuous enhancements in data quality and training methodologies. When performing supervised fine-tuning (SFT) on the same training data, our model significantly outperforms open-source models of the same size, and even surpasses larger models such as Qwen3-8B. To support research and technological advancement in the open-source community, we have open-sourced the Nanbeige4-3B-Base model together with its technical methodology.

2. Model Summary

Training Data

  • We constructed a comprehensive 23T-tokens training corpus from web texts, books, code, and papers, meticulously filtered through a hybrid strategy of tagging-based scoring and retrieval-based recalling. This foundation was then augmented with knowledge-dense and reasoning-intensive synthetic data, including Q&A pairs, textbooks, and Long-COTs, which significantly benefited the downstream task performance.

Training Recipe

  • We designed an innovative FG-WSD (Fine-Grained Warmup-Stable-Decay) training scheduler, meticulously refining the conventional WSD approach. This scheduler was implemented with a fine-grained, quality-progressive data curriculum, dividing the Stable stage into multiple phases with progressively improved data mixtures. Compared to the vanilla WSD, our method achieved notable performance gains. During the Decay stage, we increased the proportion of math, code, synthetic QA, and synthetic Long-COT data to further enhance reasoning capabilities.
    Stage Training Tokens Learning Rate
    Warmup Stage 0.1T 0 โ€”โ€”> 4.5e-4
    Diversity-Enriched Stable Stage 12.4T Constant 4.5e-4
    High-Quality Stable Stage 6.5T Constant 4.5e-4
    Decay and Long-Context Stage 4T 4.5e-4 โ€”โ€”> 1.5e-6

3. Model Performance

For model performance comparison, we fine-tuned both our base model and the Qwen series base models using the same fine-tuning data and evaluated their downstream task metrics. We believe that when evaluating base models, this end-to-end validation approach better reflects the model's ultimate performance in downstream tasks compared to the few-shot testing approach.

To ensure a fair comparison, we conducted experiments with three distinct datasets, including Nemotron-Dataset-v1, Ring-lite-sft-data, and OpenThoughts3. For each dataset, we randomly select 500k training samples for SFT experiments.

  • Finetuned with Nemotron-Dataset-v1

    Model AIME2024 AIME2025 Math-500 GPQA
    Qwen3-4B-Base 24.6 25.0 90.4 44.6
    Qwen3-8B-Base 37.9 29.6 91.1 48.9
    Nanbeige4-3B-Base 52.9 40.8 93.4 53.4
  • Finetuned with Ring-lite-sft-data

    Model AIME2024 AIME2025 Math-500 GPQA
    Qwen3-4B-Base 40.4 31.3 93.6 51.4
    Qwen3-8B-Base 50.0 35.8 94.4 55.1
    Nanbeige4-3B-Base 56.8 45.3 95.5 57.7
  • Finetuned with OpenThoughts3

    Model AIME2024 AIME2025 Math-500 GPQA
    Qwen3-4B-Base 52.9 42.1 93.2 49.6
    Qwen3-8B-Base 60.4 47.1 95.0 55.3
    Nanbeige4-3B-Base 62.4 49.2 94.6 56.9

The results demonstrate that Nanbeige4-3B-Base significantly outperforms Qwen3-4B-Base, and even surpasses the larger Qwen3-8B-Base, highlighting the greater potential of our base model after fine-tuning. This advantage stems from the optimized training recipe during our Stable stage and the extensive high-quality synthetic data incorporated during the Decay stage.

4. Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
  'Nanbeige/Nanbeige4-3B-Base',
  use_fast=False,
  trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
  'Nanbeige/Nanbeige4-3B-Base',
  torch_dtype='auto',
  device_map='auto',
  trust_remote_code=True
)
messages = [
  {'role': 'user', 'content': 'Which number is bigger, 9.11 or 9.8?'}
]
prompt = "ไธญๅ›ฝ็š„้ฆ–้ƒฝๆ˜ฏ"
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
output_ids = model.generate(input_ids.to('cuda'))
resp = tokenizer.decode(output_ids[0][len(input_ids[0]):], skip_special_tokens=True)
print(resp)

5. Limitations

While we place great emphasis on the safety of the model during the training process, striving to ensure that its outputs align with ethical and legal requirements, it may not completely avoid generating unexpected outputs due to the model's size and probabilistic nature. These outputs may include harmful content such as bias or discrimination. Please don't propagate such content. We do not assume any responsibility for the consequences resulting from the dissemination of inappropriate information.

6. Citation

If you find our model useful or want to use it in your projects, please kindly cite this Huggingface project.

7. Contact

If you have any questions, please raise an issue or contact us at nanbeige@126.com.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nanbeige/Nanbeige4-3B-Base

Finetunes
1 model
Quantizations
2 models