ποΈ MarianMT English β Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).
π Model Overview
| Property | Description |
|---|---|
| Base Model | Helsinki-NLP/opus-mt-en-ber |
| Architecture | MarianMT |
| Languages | English β Tamazight (Tachelhit / Central Atlas Tamazight) |
| Fine-tuning Dataset | 893K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200) |
| Training Objective | Sequence-to-sequence translation fine-tuning |
| Framework | π€ Transformers |
| Tokenizer | SentencePiece |
π§ Training Details
| Hyperparameter | Value |
|---|---|
per_device_train_batch_size |
16 |
per_device_eval_batch_size |
64 |
learning_rate |
2e-5 |
num_train_epochs |
3 |
max_length |
140 |
num_beams |
6 |
eval_steps |
20000 |
save_steps |
20000 |
generation_no_repeat_ngram_size |
3 |
generation_repetition_penalty |
1.5 |
Training Environment:
- 1 Γ NVIDIA P100 (16 GB) on Kaggle
- Total training time: 9 h 50 m 28 s
π Evaluation Results
β οΈ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.
| Step | Train Loss | Val Loss | Bleu | Chrf |
|---|---|---|---|---|
| 20000 | 0.2423 | 0.2235 | 18.87 | 36.51 |
| 40000 | 0.1870 | 0.1806 | 24.73 | 42.64 |
| 60000 | 0.1633 | 0.1613 | 27.20 | 45.86 |
| 80000 | 0.1556 | 0.1497 | 30.25 | 48.49 |
| 100000 | 0.1479 | 0.1416 | 31.57 | 50.11 |
| 120000 | 0.1390 | 0.1325 | 33.89 | 52.53 |
| 140000 | 0.1317 | 0.1269 | 35.90 | 54.55 |
| 160000 | 0.1323 | 0.1243 | 36.57 | 55.15 |
π¬ Example Translations
| English | Atlasic Tamazight (Ltn) | Atlasic Tamazight (Tfng) |
|---|---|---|
| I will go to school. | rad dduΙ£ s tinml. | β΅β΄°β΄· β΄·β΄·β΅β΅ β΅ β΅β΅β΅β΅β΅. |
| What did you say? | mayd tnnit? | β΅β΄°β΅’β΄· β΅β΅β΅β΅β΅? |
| I want to know where Tom and Mary come from. | riΙ£ ad ssnΙ£ mani d yucka αΉum d mari. | β΅β΅β΅ β΄°β΄· β΅β΅β΅β΅ β΅β΄°β΅β΅ β΄· β΅’β΅β΅β΄½β΄° β΅β΅β΅ β΄· β΅β΄°β΅β΅. |
| How many girls are there in this picture? | mnck n trbatin ayd illan g twlaft ad? | β΅β΅β΅β΄½ β΅ β΅β΅β΄±β΄°β΅β΅β΅ β΄°β΅’β΄· β΅β΅β΅β΄°β΅ β΄³ β΅β΅‘β΅β΄°β΄Όβ΅ β΄°β΄·? |
Hugging Face Space:
π ilyasaqit/English-Tamazight-Translator
πͺΆ Notes
- The dataset is synthetic, not manually verified.
- The model performs best on short and simple general-domain sentences.
- Recommended decoding parameters:
num_beams=6repetition_penalty=1.2β1.5no_repeat_ngram_size=3
π Citation
If you use this model, please cite:
@misc{marian-en-tamazight-2025,
title = {MarianMT English β Atlasic Tamazight (Tachelhit / Central Atlas)},
year = {2025},
url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv}
}
- Downloads last month
- 50
Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv
Base model
Helsinki-NLP/opus-mt-en-ber