ThinMQM
Collection
ThinMQM (automated translation evaluation, MQM) model and data collection.
•
5 items
•
Updated
| Metric/Model | Avg. | En-De SPA (%) | En-De $Acc^*_eq$ | En-Es SPA (%) | En-Es $Acc^*_eq$ | Ja-Zh SPA (%) | Ja-Zh $Acc^*_eq$ |
|---|---|---|---|---|---|---|---|
| QwQ 32B | 68.3 | 79.8 | 46.8 | 76.1 | 68.0 | 91.9 | 46.9 |
| + ThinMQM | 72.2 (+3.9) | 83.2 (+3.4) | 52.5 (+5.7) | 80.7 (+4.6) | 69.2 (+1.2) | 91.3 (−0.6) | 56.1 (+9.2) |
| R1-Distill-Llama-8B | 64.9 | 71.8 | 42.9 | 78.5 | 68.0 | 84.7 | 43.5 |
| + ThinMQM | 70.8 (+5.9) | 85.5 (+13.7) | 48.6 (+5.7) | 81.3 (+2.8) | 68.2 (+0.2) | 90.5 (+5.8) | 51.0 (+7.5) |
| R1-Distill-Qwen-7B | 61.1 | 67.3 | 42.9 | 61.0 | 68.0 | 83.8 | 43.5 |
| + ThinMQM | 69.8 (+8.7) | 84.5 (+17.2) | 48.5 (+5.6) | 77.8 (+16.8) | 68.0 (+0.0) | 89.0 (+5.2) | 51.3 (+7.8) |
| Released Models | HF Model | Template | Trained Dataset |
|---|---|---|---|
| rzzhan/ThinMQM-32B | https://huggingface.co/rzzhan/ThinMQM-32B | thinking |
https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_src |
| rzzhan/ThinMQM-8B | https://huggingface.co/rzzhan/ThinMQM-32B | thinking_ref |
https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref |
| rzzhan/ThinMQM-7B | https://huggingface.co/rzzhan/ThinMQM-32B | thinking_ref |
https://huggingface.co/datasets/rzzhan/ThinMQM-12k/ thinmqm12k_ref |
If you find our model, data, or evaluation code useful, please kindly cite our paper:
@article{zhan2025thinmqm,
title={Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost},
author={Zhan, Runzhe and Huang, Zhihong and Yang, Xinyi and Chao, Lidia S and Yang, Min and Wong, Derek F},
year={2025},
journal = {ArXiv preprint},
volume = {2510.20780},
url={https://arxiv.org/abs/2510.20780},
}
For questions, feedback, or collaboration opportunities, feel free to reach out:
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.