English - Efik (NLLB-200 Distilled)

Fine-tuned NLLB-200 model for translating English -> Efik. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "luel/nllb-200-distilled-600M-ft-en-efi"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)

input_example = "How are you?"
inputs = tokenizer(input_example, return_tensors="pt")

generated_ids = model.generate(
    **inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("ibo_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Training details (summary)

Item Value
Base model facebook/nllb-200-distilled-600M
Dataset Davlan/ibom-mt-en-efi
Script lafand-mt
Epochs 8
Effective batch size 32 (16 × 2 grad-accum)
Learning rate 3e-5
Mixed precision bf16
Early stopping Patience = 3, min_delta (BLEU) = 0.001

Evaluation

Metric en->efi
BLEU 39.9
chrF 58.5

Limitations

  • Using the Igbo token as a stand-in for Efik may introduce lexical differences.
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for luel/nllb-200-distilled-600M-ft-en-efi

Finetuned
(208)
this model

Dataset used to train luel/nllb-200-distilled-600M-ft-en-efi

Collection including luel/nllb-200-distilled-600M-ft-en-efi

Evaluation results