Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

Model Description

Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a lightweight agent core for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:

Full μP Support: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
Efficient Tokenizer: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
FP8 Mixed Precision: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
Optimizer Scheduling: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance

Trained with only 1.4T tokens, Xmodel-2.5 achieves 52.49% average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.

Model Architecture

Hyperparameter	Value
Hidden size	1536
Intermediate size	3840
Transformer layers	48
Attention heads (Q)	24
KV heads (GQA)	8
Sequence length	3712
Max position embeddings	131072
RoPE base	500000

Intended Uses & Limitations

Intended Uses

Complex reasoning tasks
Lightweight AI agent applications
Educational and research purposes
Resource-constrained environments

Limitations

Limited to 1.3B parameter capacity
May struggle with highly specialized domains
Performance may vary on non-English languages

Training Details

Training Strategy

Three-stage WSD curriculum: 560k steps, 1.4T tokens
Warmup phase: 2k steps, linear learning rate increase
Stable phase: 530k steps, gradually increasing batch size
Decay phase: 20k steps, mixing 66.9% high-quality SFT data
Long-context adaptation: 10k additional steps for 16K context support

Key Innovations

μP hyperparameter transfer: Direct transfer from 20M parameter proxy model to full model
Optimizer switching: AdamW → Muon during decay phase for improved reasoning performance
FP8 mixed precision: FP8 format significantly enhances training efficiency

Performance

Comprehensive Reasoning Performance

Model	Parameters	Training Tokens	13-Task Average
Qwen3-1.7B	1.7B	36T	56.96%
Xmodel-2.5	1.3B	1.4T	52.49%
InternLM2.5-1.8B	1.8B	-	50.19%
Xmodel-2-1.2B	1.2B	1.5T	50.34%
MiniCPM-1B	1B	-	48.95%
SmolLM2-1.7B	1.7B	11T	46.88%
Llama-3.2-1B	1B	9T	44.72%

Detailed Task Performance

Task	Xmodel-2.5	Xmodel-2	Improvement
ARC-Challenge	48.89	46.16	+2.73
ARC-Easy	76.94	76.22	+0.72
PIQA	75.95	75.14	+0.81
HellaSwag	67.24	64.05	+3.19
WinoGrande	64.64	64.25	+0.39
BBH	54.58	48.90	+5.68
MMLU	51.81	49.98	+1.83
GSM8k	58.98	56.56	+2.42
MATH	28.94	25.64	+3.30
HumanEval	28.66	29.27	-0.61
MBPP	33.00	30.80	+2.20
CMMLU	47.16	44.29	+2.87
C-Eval	45.54	43.16	+2.38

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import os

model_path = "XiaoduoAILab/Xmodel-2.5"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

prompt = "Explain the concept of transfer learning in machine learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

# Generation configuration
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id
)

output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)
print("Generated Response:")
print(output)

Citation

If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:

@misc{liu2025xmodel25,
      title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM}, 
      author={Yang Liu and Xiaolong Zhong and Ling Jiang},
      year={2025},
      eprint={2511.19496},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.19496}, 
}

Contact

For questions or suggestions, please contact us through:

GitHub Issues: Xmodel-2.5 Issues
Email: foamilu@yeah.net

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Downloads last month: 41