Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost

Paper Github Hugging Face Collection Hugging Face Collection

Metrics

Metric/Model Avg. En-De SPA (%) En-De $Acc^*_eq$ En-Es SPA (%) En-Es $Acc^*_eq$ Ja-Zh SPA (%) Ja-Zh $Acc^*_eq$
QwQ 32B 68.3 79.8 46.8 76.1 68.0 91.9 46.9
+ ThinMQM 72.2 (+3.9) 83.2 (+3.4) 52.5 (+5.7) 80.7 (+4.6) 69.2 (+1.2) 91.3 (−0.6) 56.1 (+9.2)
R1-Distill-Llama-8B 64.9 71.8 42.9 78.5 68.0 84.7 43.5
+ ThinMQM 70.8 (+5.9) 85.5 (+13.7) 48.6 (+5.7) 81.3 (+2.8) 68.2 (+0.2) 90.5 (+5.8) 51.0 (+7.5)
R1-Distill-Qwen-7B 61.1 67.3 42.9 61.0 68.0 83.8 43.5
+ ThinMQM 69.8 (+8.7) 84.5 (+17.2) 48.5 (+5.6) 77.8 (+16.8) 68.0 (+0.0) 89.0 (+5.2) 51.3 (+7.8)

Model & Data Card

Released Models HF Model Template Trained Dataset
rzzhan/ThinMQM-32B https://huggingface.co/rzzhan/ThinMQM-32B thinking https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_src
rzzhan/ThinMQM-8B https://huggingface.co/rzzhan/ThinMQM-32B thinking_ref https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref
rzzhan/ThinMQM-7B https://huggingface.co/rzzhan/ThinMQM-32B thinking_ref https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref

📝 Citation

If you find our model, data, or evaluation code useful, please kindly cite our paper:

@article{zhan2025thinmqm,
      title={Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost}, 
      author={Zhan, Runzhe and Huang, Zhihong and Yang, Xinyi and Chao, Lidia S and Yang, Min and Wong, Derek F},
      year={2025},
      journal = {ArXiv preprint},
      volume = {2510.20780},
      url={https://arxiv.org/abs/2510.20780}, 
}

📬 Contact

For questions, feedback, or collaboration opportunities, feel free to reach out:


📄 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rzzhan/ThinMQM-7B

Quantizations
2 models

Dataset used to train rzzhan/ThinMQM-7B

Collection including rzzhan/ThinMQM-7B