English - Efik (NLLB-200 Distilled)

Fine-tuned NLLB-200 model for translating English -> Efik. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "luel/nllb-200-distilled-600M-ft-en-efi"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)

input_example = "How are you?"
inputs = tokenizer(input_example, return_tensors="pt")

generated_ids = model.generate(
    **inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("ibo_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Training details (summary)

Item	Value
Base model	facebook/nllb-200-distilled-600M
Dataset	Davlan/ibom-mt-en-efi
Script	lafand-mt
Epochs	8
Effective batch size	32 (16 × 2 grad-accum)
Learning rate	3e-5
Mixed precision	bf16
Early stopping	Patience = 3, min_delta (BLEU) = 0.001

Evaluation

Metric	en->efi
BLEU	39.9
chrF	58.5

Limitations

Using the Igbo token as a stand-in for Efik may introduce lexical differences.

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for luel/nllb-200-distilled-600M-ft-en-efi

Base model

facebook/nllb-200-distilled-600M

Finetuned

(208)

this model

Dataset used to train luel/nllb-200-distilled-600M-ft-en-efi

Collection including luel/nllb-200-distilled-600M-ft-en-efi

MT

Collection

Machine Translation • 2 items • Updated Jul 29

Evaluation results

BLEU on Ibom-MT (en-efi)
self-reported

39.900
chrF on Ibom-MT (en-efi)
self-reported

58.500

View on Papers With Code