---
license: apache-2.0
base_model:
- google/gemma-2-9b-it
tags:
- translation
---
# 🚀 ReMedy: Machine Translation Evaluation via Reward Modeling
**Learning High-Quality Machine Translation Evaluation from Human Preferences with Reward Modeling**
[](https://arxiv.org/abs/2504.13630)
[](https://pypi.org/project/remedy-mt-eval/)
[](https://github.com/Smu-Tan/Remedy/stargazers)
[](./LICENSE)
---
## ✨ About ReMedy
**ReMedy** is a new state-of-the-art machine translation (MT) evaluation framework that reframes the task as **reward modeling** rather than direct regression. Instead of relying on noisy human scores, ReMedy learns from **pairwise human preferences**, leading to better alignment with human judgments.
- 📈 **State-of-the-art accuracy** on WMT22–24 (39 language pairs, 111 systems)
- ⚖️ **Segment- and system-level** evaluation, outperforming GPT-4, PaLM-540B, Finetuned-PaLM2, MetricX-13B, and XCOMET
- 🔍 **More robust** on low-quality and out-of-domain translations (ACES, MSLC benchmarks)
- 🧠 Can be used as a **reward model** in RLHF pipelines to improve MT systems
> ReMedy demonstrates that **reward modeling with pairwise preferences** offers a more reliable and human-aligned approach for MT evaluation.
---
## 📚 Contents
- [📦 Quick Installation](#-quick-installation)
- [⚙️ Requirements](#️-requirements)
- [🚀 Usage](#-usage)
- [💾 Download Models](#-download-remedy-models)
- [🔹 Basic Usage](#-basic-usage)
- [🔹 Reference-Free Mode](#-reference-free-mode)
- [📄 Output Files](#-output-files)
- [⚙️ Full Argument List](#️-full-argument-list)
- [🧠 Model Variants](#-model-variants)
- [🔁 Reproducing WMT Results](#-reproducing-wmt-results)
- [📚 Citation](#-citation)
---
## 📦 Quick Installation
> ReMedy requires **Python ≥ 3.10**, and leverages **[VLLM](https://github.com/vllm-project/vllm)** for fast inference.
### ✅ Recommended: Install via pip
```bash
pip install remedy-mt-eval
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
```
### 🛠️ Install from Source
```bash
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
pip install -e .
```
### 📜 Install via Poetry
```bash
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
poetry install
```
---
## ⚙️ Requirements
- `Python` ≥ 3.10
- `transformers` ≥ 4.51.1
- `vllm` ≥ 0.8.5
- `torch` ≥ 2.6.0
- *(See `pyproject.toml` for full dependencies)*
---
## 🚀 Usage
### 💾 Download ReMedy Models
Before using, download the model from HuggingFace:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ShaomuTan/ReMedy-9B-23 --local-dir Models/remedy-9B-23
```
You can replace `ReMedy-9B-22` with other variants like `ReMedy-9B-23`.
---
### 🔹 Basic Usage
```bash
remedy-score \
--model Models/remedy-9B-22 \
--src_file testcase/en.src \
--mt_file testcase/en-de.hyp \
--ref_file testcase/de.ref \
--src_lang en --tgt_lang de \
--cache_dir $CACHE_DIR \
--save_dir testcase \
--num_gpus 4 \
--calibrate
```
### 🔹 Reference-Free Mode (Quality Estimation)
```bash
remedy-score \
--model Models/remedy-9B-22 \
--src_file testcase/en.src \
--mt_file testcase/en-de.hyp \
--no_ref \
--src_lang en --tgt_lang de \
--cache_dir $CACHE_DIR \
--save_dir testcase \
--num_gpus 4 \
--calibrate
```
---
## 📄 Output Files
- `src-tgt_raw_scores.txt`
- `src-tgt_sigmoid_scores.txt`
- `src-tgt_calibration_scores.txt`
- `src-tgt_detailed_results.tsv`
- `src-tgt_result.json`
Inspired by **SacreBLEU**, ReMedy provides JSON-style results to ensure transparency and comparability.
📘 Example JSON Output
```json
{
"metric_name": "remedy-9B-22",
"raw_score": 4.502863049214531,
"sigmoid_score": 0.9613502018042875,
"calibration_score": 0.9029647169507162,
"calibration_temp": 1.7999999999999998,
"signature": "metric_name:remedy-9B-22|lp:en-de|ref:yes|version:0.1.1",
"language_pair": "en-de",
"source_language": "en",
"target_language": "de",
"segments": 2037,
"version": "0.1.1",
"args": {
"src_file": "testcase/en.src",
"mt_file": "testcase/en-de.hyp",
"src_lang": "en",
"tgt_lang": "de",
"model": "Models/remedy-9B-22",
"cache_dir": "Models",
"save_dir": "testcase",
"ref_file": "testcase/de.ref",
"no_ref": false,
"calibrate": true,
"num_gpus": 4,
"num_seqs": 256,
"max_length": 4096,
"enable_truncate": false,
"version": false,
"list_languages": false
}
}
```
---
## ⚙️ Full Argument List
📋 Show CLI Arguments
### 🔸 Required
```python
--src_file # Path to source file
--mt_file # Path to MT output file
--src_lang # Source language code
--tgt_lang # Target language code
--model # Model path or HuggingFace ID
--save_dir # Output directory
```
### 🔸 Optional
```python
--ref_file # Reference file path
--no_ref # Reference-free mode
--cache_dir # Cache directory
--calibrate # Enable calibration
--num_gpus # Number of GPUs
--num_seqs # Number of sequences (default: 256)
--max_length # Max token length (default: 4096)
--enable_truncate # Truncate sequences
--version # Print version
--list_languages # List supported languages
```
---
## 🧠 Model Variants
| Model | Size | Base Model | Ref/QE | Download |
|---------------|------|--------------|--------|----------|
| ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
| ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
| ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
> More variants coming soon...
---
## 🔁 Reproducing WMT Results
Click to show instructions for reproducing WMT22–24 evaluation
### 1. Install `mt-metrics-eval`
```bash
git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .
```
### 2. Download WMT evaluation data
```bash
python3 -m mt_metrics_eval.mtme --download
```
### 3. Run ReMedy on WMT data
```bash
bash wmt/wmt22.sh
bash wmt/wmt23.sh
bash wmt/wmt24.sh
```
> 📄 Results will be comparable with other metrics reported in WMT shared tasks.
---
## 📚 Citation
If you use **ReMedy**, please cite the following paper:
```bibtex
@article{tan2024remedy,
title={ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling},
author={Tan, Shaomu and Monz, Christof},
journal={arXiv preprint},
year={2024}
}
```
---