Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Model Description
Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a lightweight agent core for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:
- Full μP Support: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
- Efficient Tokenizer: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
- FP8 Mixed Precision: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
- Optimizer Scheduling: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance
Trained with only 1.4T tokens, Xmodel-2.5 achieves 52.49% average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.
Model Architecture
| Hyperparameter | Value |
|---|---|
| Hidden size | 1536 |
| Intermediate size | 3840 |
| Transformer layers | 48 |
| Attention heads (Q) | 24 |
| KV heads (GQA) | 8 |
| Sequence length | 3712 |
| Max position embeddings | 131072 |
| RoPE base | 500000 |
Intended Uses & Limitations
Intended Uses
- Complex reasoning tasks
- Lightweight AI agent applications
- Educational and research purposes
- Resource-constrained environments
Limitations
- Limited to 1.3B parameter capacity
- May struggle with highly specialized domains
- Performance may vary on non-English languages
Training Details
Training Strategy
- Three-stage WSD curriculum: 560k steps, 1.4T tokens
- Warmup phase: 2k steps, linear learning rate increase
- Stable phase: 530k steps, gradually increasing batch size
- Decay phase: 20k steps, mixing 66.9% high-quality SFT data
- Long-context adaptation: 10k additional steps for 16K context support
Key Innovations
- μP hyperparameter transfer: Direct transfer from 20M parameter proxy model to full model
- Optimizer switching: AdamW → Muon during decay phase for improved reasoning performance
- FP8 mixed precision: FP8 format significantly enhances training efficiency
Performance
Comprehensive Reasoning Performance
| Model | Parameters | Training Tokens | 13-Task Average |
|---|---|---|---|
| Qwen3-1.7B | 1.7B | 36T | 56.96% |
| Xmodel-2.5 | 1.3B | 1.4T | 52.49% |
| InternLM2.5-1.8B | 1.8B | - | 50.19% |
| Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% |
| MiniCPM-1B | 1B | - | 48.95% |
| SmolLM2-1.7B | 1.7B | 11T | 46.88% |
| Llama-3.2-1B | 1B | 9T | 44.72% |
Detailed Task Performance
| Task | Xmodel-2.5 | Xmodel-2 | Improvement |
|---|---|---|---|
| ARC-Challenge | 48.89 | 46.16 | +2.73 |
| ARC-Easy | 76.94 | 76.22 | +0.72 |
| PIQA | 75.95 | 75.14 | +0.81 |
| HellaSwag | 67.24 | 64.05 | +3.19 |
| WinoGrande | 64.64 | 64.25 | +0.39 |
| BBH | 54.58 | 48.90 | +5.68 |
| MMLU | 51.81 | 49.98 | +1.83 |
| GSM8k | 58.98 | 56.56 | +2.42 |
| MATH | 28.94 | 25.64 | +3.30 |
| HumanEval | 28.66 | 29.27 | -0.61 |
| MBPP | 33.00 | 30.80 | +2.20 |
| CMMLU | 47.16 | 44.29 | +2.87 |
| C-Eval | 45.54 | 43.16 | +2.38 |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
model_path = "XiaoduoAILab/Xmodel-2.5"
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)
prompt = "Explain the concept of transfer learning in machine learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generation configuration
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512,
do_sample=True,
top_p=0.9,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id
)
output = tokenizer.decode(
generated_ids[0][len(model_inputs.input_ids[0]):],
skip_special_tokens=True
)
print("Generated Response:")
print(output)
Citation
If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:
@misc{liu2025xmodel25,
title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM},
author={Yang Liu and Xiaolong Zhong and Ling Jiang},
year={2025},
eprint={2511.19496},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.19496},
}
Contact
For questions or suggestions, please contact us through:
- GitHub Issues: Xmodel-2.5 Issues
- Email: foamilu@yeah.net
License
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
- Downloads last month
- 41