Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost

Paper Github Hugging Face Collection Hugging Face Collection

Metrics

Metric/Model Avg. En-De SPA (%) En-De $Acc^*_eq$ En-Es SPA (%) En-Es $Acc^*_eq$ Ja-Zh SPA (%) Ja-Zh $Acc^*_eq$
QwQ 32B 68.3 79.8 46.8 76.1 68.0 91.9 46.9
+ ThinMQM 72.2 (+3.9) 83.2 (+3.4) 52.5 (+5.7) 80.7 (+4.6) 69.2 (+1.2) 91.3 (βˆ’0.6) 56.1 (+9.2)
R1-Distill-Llama-8B 64.9 71.8 42.9 78.5 68.0 84.7 43.5
+ ThinMQM 70.8 (+5.9) 85.5 (+13.7) 48.6 (+5.7) 81.3 (+2.8) 68.2 (+0.2) 90.5 (+5.8) 51.0 (+7.5)
R1-Distill-Qwen-7B 61.1 67.3 42.9 61.0 68.0 83.8 43.5
+ ThinMQM 69.8 (+8.7) 84.5 (+17.2) 48.5 (+5.6) 77.8 (+16.8) 68.0 (+0.0) 89.0 (+5.2) 51.3 (+7.8)

Model & Data Card

Released Models HF Model Template Trained Dataset
rzzhan/ThinMQM-32B https://huggingface.co/rzzhan/ThinMQM-32B thinking https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_src
rzzhan/ThinMQM-8B https://huggingface.co/rzzhan/ThinMQM-32B thinking_ref https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref
rzzhan/ThinMQM-7B https://huggingface.co/rzzhan/ThinMQM-32B thinking_ref https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref

πŸ“ Citation

If you find our model, data, or evaluation code useful, please kindly cite our paper:

@article{zhan2025thinmqm,
      title={Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost}, 
      author={Zhan, Runzhe and Huang, Zhihong and Yang, Xinyi and Chao, Lidia S and Yang, Min and Wong, Derek F},
      year={2025},
      journal = {ArXiv preprint},
      volume = {2510.20780},
      url={https://arxiv.org/abs/2510.20780}, 
}

πŸ“¬ Contact

For questions, feedback, or collaboration opportunities, feel free to reach out:


πŸ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Downloads last month
22
Safetensors
Model size
33B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rzzhan/ThinMQM-32B

Quantizations
2 models

Dataset used to train rzzhan/ThinMQM-32B

Collection including rzzhan/ThinMQM-32B