πŸ”οΈ MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β†’ Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).


πŸ“˜ Model Overview

Property Description
Base Model Helsinki-NLP/opus-mt-en-ber
Architecture MarianMT
Languages English β†’ Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset 893K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200)
Training Objective Sequence-to-sequence translation fine-tuning
Framework πŸ€— Transformers
Tokenizer SentencePiece

🧠 Training Details

Hyperparameter Value
per_device_train_batch_size 16
per_device_eval_batch_size 64
learning_rate 2e-5
num_train_epochs 3
max_length 140
num_beams 6
eval_steps 20000
save_steps 20000
generation_no_repeat_ngram_size 3
generation_repetition_penalty 1.5

Training Environment:
- 1 Γ— NVIDIA P100 (16 GB) on Kaggle
- Total training time: 9 h 50 m 28 s

πŸ“ˆ Evaluation Results

⚠️ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.

Step Train Loss Val Loss Bleu Chrf
20000 0.2423 0.2235 18.87 36.51
40000 0.1870 0.1806 24.73 42.64
60000 0.1633 0.1613 27.20 45.86
80000 0.1556 0.1497 30.25 48.49
100000 0.1479 0.1416 31.57 50.11
120000 0.1390 0.1325 33.89 52.53
140000 0.1317 0.1269 35.90 54.55
160000 0.1323 0.1243 36.57 55.15

πŸ’¬ Example Translations

English Atlasic Tamazight (Ltn) Atlasic Tamazight (Tfng)
I will go to school. rad dduΙ£ s tinml. β΅”β΄°β΄· β΄·β΄·β΅“β΅– β΅™ β΅œβ΅‰β΅β΅Žβ΅.
What did you say? mayd tnnit? ⡎ⴰ⡒ⴷ β΅œβ΅β΅β΅‰β΅œ?
I want to know where Tom and Mary come from. riΙ£ ad ssnΙ£ mani d yucka αΉ­um d mari. ⡔⡉⡖ β΄°β΄· ⡙⡙⡏⡖ β΅Žβ΄°β΅β΅‰ β΄· β΅’β΅“β΅›β΄½β΄° β΅Ÿβ΅“β΅Ž β΄· β΅Žβ΄°β΅”β΅‰.
How many girls are there in this picture? mnck n trbatin ayd illan g twlaft ad? β΅Žβ΅β΅›β΄½ ⡏ β΅œβ΅”β΄±β΄°β΅œβ΅‰β΅ β΄°β΅’β΄· ⡉⡍⡍ⴰ⡏ β΄³ ⡜⡑⡍ⴰⴼ⡜ β΄°β΄·?

Hugging Face Space:
πŸ‘‰ ilyasaqit/English-Tamazight-Translator


πŸͺΆ Notes

  • The dataset is synthetic, not manually verified.
  • The model performs best on short and simple general-domain sentences.
  • Recommended decoding parameters:
    • num_beams=6
    • repetition_penalty=1.2–1.5
    • no_repeat_ngram_size=3

πŸ“š Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv}
}
Downloads last month
50
Safetensors
Model size
62.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv

Finetuned
(4)
this model

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv 1