--- license: apache-2.0 base_model: - google/gemma-2-9b-it tags: - translation --- # 🚀 ReMedy: Machine Translation Evaluation via Reward Modeling
**Learning High-Quality Machine Translation Evaluation from Human Preferences with Reward Modeling**
[![arXiv](https://img.shields.io/badge/arXiv-2405.12345-b31b1b)](https://arxiv.org/abs/2504.13630) [![PyPI version](https://img.shields.io/pypi/v/remedy-mt-eval)](https://pypi.org/project/remedy-mt-eval/) [![GitHub Stars](https://img.shields.io/github/stars/Smu-Tan/Remedy)](https://github.com/Smu-Tan/Remedy/stargazers) [![License](https://img.shields.io/github/license/Smu-Tan/Remedy)](./LICENSE) --- ## ✨ About ReMedy **ReMedy** is a new state-of-the-art machine translation (MT) evaluation framework that reframes the task as **reward modeling** rather than direct regression. Instead of relying on noisy human scores, ReMedy learns from **pairwise human preferences**, leading to better alignment with human judgments. - 📈 **State-of-the-art accuracy** on WMT22–24 (39 language pairs, 111 systems) - ⚖️ **Segment- and system-level** evaluation, outperforming GPT-4, PaLM-540B, Finetuned-PaLM2, MetricX-13B, and XCOMET - 🔍 **More robust** on low-quality and out-of-domain translations (ACES, MSLC benchmarks) - 🧠 Can be used as a **reward model** in RLHF pipelines to improve MT systems > ReMedy demonstrates that **reward modeling with pairwise preferences** offers a more reliable and human-aligned approach for MT evaluation. --- ## 📚 Contents - [📦 Quick Installation](#-quick-installation) - [⚙️ Requirements](#️-requirements) - [🚀 Usage](#-usage) - [💾 Download Models](#-download-remedy-models) - [🔹 Basic Usage](#-basic-usage) - [🔹 Reference-Free Mode](#-reference-free-mode) - [📄 Output Files](#-output-files) - [⚙️ Full Argument List](#️-full-argument-list) - [🧠 Model Variants](#-model-variants) - [🔁 Reproducing WMT Results](#-reproducing-wmt-results) - [📚 Citation](#-citation) --- ## 📦 Quick Installation > ReMedy requires **Python ≥ 3.10**, and leverages **[VLLM](https://github.com/vllm-project/vllm)** for fast inference. ### ✅ Recommended: Install via pip ```bash pip install remedy-mt-eval git clone https://github.com/Smu-Tan/Remedy cd Remedy ``` ### 🛠️ Install from Source ```bash git clone https://github.com/Smu-Tan/Remedy cd Remedy pip install -e . ``` ### 📜 Install via Poetry ```bash git clone https://github.com/Smu-Tan/Remedy cd Remedy poetry install ``` --- ## ⚙️ Requirements - `Python` ≥ 3.10 - `transformers` ≥ 4.51.1 - `vllm` ≥ 0.8.5 - `torch` ≥ 2.6.0 - *(See `pyproject.toml` for full dependencies)* --- ## 🚀 Usage ### 💾 Download ReMedy Models Before using, download the model from HuggingFace: ```bash HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ShaomuTan/ReMedy-9B-23 --local-dir Models/remedy-9B-23 ``` You can replace `ReMedy-9B-22` with other variants like `ReMedy-9B-23`. --- ### 🔹 Basic Usage ```bash remedy-score \ --model Models/remedy-9B-22 \ --src_file testcase/en.src \ --mt_file testcase/en-de.hyp \ --ref_file testcase/de.ref \ --src_lang en --tgt_lang de \ --cache_dir $CACHE_DIR \ --save_dir testcase \ --num_gpus 4 \ --calibrate ``` ### 🔹 Reference-Free Mode (Quality Estimation) ```bash remedy-score \ --model Models/remedy-9B-22 \ --src_file testcase/en.src \ --mt_file testcase/en-de.hyp \ --no_ref \ --src_lang en --tgt_lang de \ --cache_dir $CACHE_DIR \ --save_dir testcase \ --num_gpus 4 \ --calibrate ``` --- ## 📄 Output Files - `src-tgt_raw_scores.txt` - `src-tgt_sigmoid_scores.txt` - `src-tgt_calibration_scores.txt` - `src-tgt_detailed_results.tsv` - `src-tgt_result.json` Inspired by **SacreBLEU**, ReMedy provides JSON-style results to ensure transparency and comparability.
📘 Example JSON Output ```json { "metric_name": "remedy-9B-22", "raw_score": 4.502863049214531, "sigmoid_score": 0.9613502018042875, "calibration_score": 0.9029647169507162, "calibration_temp": 1.7999999999999998, "signature": "metric_name:remedy-9B-22|lp:en-de|ref:yes|version:0.1.1", "language_pair": "en-de", "source_language": "en", "target_language": "de", "segments": 2037, "version": "0.1.1", "args": { "src_file": "testcase/en.src", "mt_file": "testcase/en-de.hyp", "src_lang": "en", "tgt_lang": "de", "model": "Models/remedy-9B-22", "cache_dir": "Models", "save_dir": "testcase", "ref_file": "testcase/de.ref", "no_ref": false, "calibrate": true, "num_gpus": 4, "num_seqs": 256, "max_length": 4096, "enable_truncate": false, "version": false, "list_languages": false } } ```
--- ## ⚙️ Full Argument List
📋 Show CLI Arguments ### 🔸 Required ```python --src_file # Path to source file --mt_file # Path to MT output file --src_lang # Source language code --tgt_lang # Target language code --model # Model path or HuggingFace ID --save_dir # Output directory ``` ### 🔸 Optional ```python --ref_file # Reference file path --no_ref # Reference-free mode --cache_dir # Cache directory --calibrate # Enable calibration --num_gpus # Number of GPUs --num_seqs # Number of sequences (default: 256) --max_length # Max token length (default: 4096) --enable_truncate # Truncate sequences --version # Print version --list_languages # List supported languages ```
--- ## 🧠 Model Variants | Model | Size | Base Model | Ref/QE | Download | |---------------|------|--------------|--------|----------| | ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) | | ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) | | ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) | > More variants coming soon... --- ## 🔁 Reproducing WMT Results
Click to show instructions for reproducing WMT22–24 evaluation ### 1. Install `mt-metrics-eval` ```bash git clone https://github.com/google-research/mt-metrics-eval.git cd mt-metrics-eval pip install . ``` ### 2. Download WMT evaluation data ```bash python3 -m mt_metrics_eval.mtme --download ``` ### 3. Run ReMedy on WMT data ```bash bash wmt/wmt22.sh bash wmt/wmt23.sh bash wmt/wmt24.sh ``` > 📄 Results will be comparable with other metrics reported in WMT shared tasks.
--- ## 📚 Citation If you use **ReMedy**, please cite the following paper: ```bibtex @article{tan2024remedy, title={ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling}, author={Tan, Shaomu and Monz, Christof}, journal={arXiv preprint}, year={2024} } ``` ---