🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English → Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).

📘 Model Overview

Property	Description
Base Model	`Helsinki-NLP/opus-mt-en-ber`
Architecture	MarianMT
Languages	English → Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset	893K medium-quality synthetic sentence pairs generated by translating English corpora using (NLLB-200)
Training Objective	Sequence-to-sequence translation fine-tuning
Framework	🤗 Transformers
Tokenizer	SentencePiece

🧠 Training Details

Hyperparameter	Value
`per_device_train_batch_size`	16
`per_device_eval_batch_size`	64
`learning_rate`	2e-5
`num_train_epochs`	3
`max_length`	140
`num_beams`	6
`eval_steps`	20000
`save_steps`	20000
`generation_no_repeat_ngram_size`	3
`generation_repetition_penalty`	1.5

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 9 h 50 m 28 s

📈 Evaluation Results

⚠️ Note: The validation set is fully synthetic (NLLB-200). BLEU only measures similarity to synthetic outputs, not human-level accuracy.

Step	Train Loss	Val Loss	Bleu	Chrf
20000	0.2423	0.2235	18.87	36.51
40000	0.1870	0.1806	24.73	42.64
60000	0.1633	0.1613	27.20	45.86
80000	0.1556	0.1497	30.25	48.49
100000	0.1479	0.1416	31.57	50.11
120000	0.1390	0.1325	33.89	52.53
140000	0.1317	0.1269	35.90	54.55
160000	0.1323	0.1243	36.57	55.15

💬 Example Translations

English	Atlasic Tamazight (Ltn)	Atlasic Tamazight (Tfng)
I will go to school.	rad dduɣ s tinml.	ⵔⴰⴷ ⴷⴷⵓⵖ ⵙ ⵜⵉⵏⵎⵍ.
What did you say?	mayd tnnit?	ⵎⴰⵢⴷ ⵜⵏⵏⵉⵜ?
I want to know where Tom and Mary come from.	riɣ ad ssnɣ mani d yucka ṭum d mari.	ⵔⵉⵖ ⴰⴷ ⵙⵙⵏⵖ ⵎⴰⵏⵉ ⴷ ⵢⵓⵛⴽⴰ ⵟⵓⵎ ⴷ ⵎⴰⵔⵉ.
How many girls are there in this picture?	mnck n trbatin ayd illan g twlaft ad?	ⵎⵏⵛⴽ ⵏ ⵜⵔⴱⴰⵜⵉⵏ ⴰⵢⴷ ⵉⵍⵍⴰⵏ ⴳ ⵜⵡⵍⴰⴼⵜ ⴰⴷ?

Hugging Face Space:
👉 ilyasaqit/English-Tamazight-Translator

🪶 Notes

The dataset is synthetic, not manually verified.
The model performs best on short and simple general-domain sentences.
Recommended decoding parameters:
- num_beams=6
- repetition_penalty=1.2–1.5
- no_repeat_ngram_size=3

📚 Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv}
}

Downloads last month: 50

Safetensors

Model size

62.6M params

Tensor type

F32

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv

Base model

Helsinki-NLP/opus-mt-en-ber

Finetuned

(4)

this model

ilyasaqit
/

opus-mt-en-atlasic_tamazight-synth893k-nmv

🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

📘 Model Overview

🧠 Training Details

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 9 h 50 m 28 s

📈 Evaluation Results

💬 Example Translations

🪶 Notes

📚 Citation

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv 1

🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

📘 Model Overview

🧠 Training Details

Training Environment:- 1 × NVIDIA P100 (16 GB) on Kaggle- Total training time: 9 h 50 m 28 s

📈 Evaluation Results

💬 Example Translations

🪶 Notes

📚 Citation

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth893k-nmv 1

Training Environment:
- 1 × NVIDIA P100 (16 GB) on Kaggle
- Total training time: 9 h 50 m 28 s