MT
Collection
Machine Translation
•
2 items
•
Updated
Fine-tuned NLLB-200 model for translating English -> Efik. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "luel/nllb-200-distilled-600M-ft-en-efi"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)
input_example = "How are you?"
inputs = tokenizer(input_example, return_tensors="pt")
generated_ids = model.generate(
**inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("ibo_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
| Item | Value |
|---|---|
| Base model | facebook/nllb-200-distilled-600M |
| Dataset | Davlan/ibom-mt-en-efi |
| Script | lafand-mt |
| Epochs | 8 |
| Effective batch size | 32 (16 × 2 grad-accum) |
| Learning rate | 3e-5 |
| Mixed precision | bf16 |
| Early stopping | Patience = 3, min_delta (BLEU) = 0.001 |
| Metric | en->efi |
|---|---|
| BLEU | 39.9 |
| chrF | 58.5 |
Base model
facebook/nllb-200-distilled-600M