Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

hf_space arXiv Code License github github

Model Description

Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a lightweight agent core for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:

  1. Full μP Support: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
  2. Efficient Tokenizer: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
  3. FP8 Mixed Precision: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
  4. Optimizer Scheduling: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance

Trained with only 1.4T tokens, Xmodel-2.5 achieves 52.49% average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.

Model Architecture

Hyperparameter Value
Hidden size 1536
Intermediate size 3840
Transformer layers 48
Attention heads (Q) 24
KV heads (GQA) 8
Sequence length 3712
Max position embeddings 131072
RoPE base 500000

Intended Uses & Limitations

Intended Uses

  • Complex reasoning tasks
  • Lightweight AI agent applications
  • Educational and research purposes
  • Resource-constrained environments

Limitations

  • Limited to 1.3B parameter capacity
  • May struggle with highly specialized domains
  • Performance may vary on non-English languages

Training Details

Training Strategy

  • Three-stage WSD curriculum: 560k steps, 1.4T tokens
  • Warmup phase: 2k steps, linear learning rate increase
  • Stable phase: 530k steps, gradually increasing batch size
  • Decay phase: 20k steps, mixing 66.9% high-quality SFT data
  • Long-context adaptation: 10k additional steps for 16K context support

Key Innovations

  • μP hyperparameter transfer: Direct transfer from 20M parameter proxy model to full model
  • Optimizer switching: AdamW → Muon during decay phase for improved reasoning performance
  • FP8 mixed precision: FP8 format significantly enhances training efficiency

Performance

Comprehensive Reasoning Performance

Model Parameters Training Tokens 13-Task Average
Qwen3-1.7B 1.7B 36T 56.96%
Xmodel-2.5 1.3B 1.4T 52.49%
InternLM2.5-1.8B 1.8B - 50.19%
Xmodel-2-1.2B 1.2B 1.5T 50.34%
MiniCPM-1B 1B - 48.95%
SmolLM2-1.7B 1.7B 11T 46.88%
Llama-3.2-1B 1B 9T 44.72%

Detailed Task Performance

Task Xmodel-2.5 Xmodel-2 Improvement
ARC-Challenge 48.89 46.16 +2.73
ARC-Easy 76.94 76.22 +0.72
PIQA 75.95 75.14 +0.81
HellaSwag 67.24 64.05 +3.19
WinoGrande 64.64 64.25 +0.39
BBH 54.58 48.90 +5.68
MMLU 51.81 49.98 +1.83
GSM8k 58.98 56.56 +2.42
MATH 28.94 25.64 +3.30
HumanEval 28.66 29.27 -0.61
MBPP 33.00 30.80 +2.20
CMMLU 47.16 44.29 +2.87
C-Eval 45.54 43.16 +2.38

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import os

model_path = "XiaoduoAILab/Xmodel-2.5"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

prompt = "Explain the concept of transfer learning in machine learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

# Generation configuration
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id
)

output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)
print("Generated Response:")
print(output)

Citation

If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:

@misc{liu2025xmodel25,
      title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM}, 
      author={Yang Liu and Xiaolong Zhong and Ling Jiang},
      year={2025},
      eprint={2511.19496},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.19496}, 
}

Contact

For questions or suggestions, please contact us through:

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support