ilyasaqit
/

opus-mt-en-atlasic_tamazight-synth169k-nmv

+---
+language:
+- en
+- tzm
+- shi
+- zgh
+tags:
+- translation
+- marian
+- tamazight
+- tachelhit
+- central-atlas
+license: mit
+datasets:
+- synthetic
+metrics:
+- bleu
+base_model:
+- Helsinki-NLP/opus-mt-en-ber
+---
+# 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)
+This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English → Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**).
+---
+## 📘 Model Overview
+| Property | Description |
+|-----------|-------------|
+| **Base Model** | `Helsinki-NLP/opus-mt-en-ber` |
+| **Architecture** | MarianMT |
+| **Languages** | English → Tamazight (Tachelhit / Central Atlas Tamazight) |
+| **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora |
+| **Training Objective** | Sequence-to-sequence translation fine-tuning |
+| **Framework** | 🤗 Transformers |
+| **Tokenizer** | SentencePiece |
+---
+## 🧠 Training Details
+| Hyperparameter | Value |
+|----------------|--------|
+| `per_device_train_batch_size` | 16 |
+| `per_device_eval_batch_size` | 48 |
+| `learning_rate` | 2e-5 |
+| `num_train_epochs` | 8 |
+| `max_length` | 128 |
+| `num_beams` | 5 |
+| `eval_steps` | 5000 |
+| `save_steps` | 5000 |
+| `generation_no_repeat_ngram_size` | 3 |
+| `generation_repetition_penalty` | 1.5 |
+**Training Environment:**
+- 1 × NVIDIA **P100 (16 GB)** on **Kaggle**
+- Total training time: **6 h 33 m 60 s**
+---
+## 📈 Evaluation Results
+| Step | Train Loss | Val Loss | BLEU |
+|------|-------------|-----------|------|
+5000  |	0.4258  |	0.4082  |	2.01
+10000 |	0.3694  |	0.3511  |	6.09
+15000 |	0.3419  |	0.3232  |	7.83
+20000 |	0.3148  |	0.3054  |	8.44
+25000 |	0.2965  |	0.2923  |	9.79
+30000 |	0.2895  |	0.2824  |	10.19
+35000 |	0.2755  |	0.2756  |	11.26
+40000 |	0.2733  |	0.2691  |	11.75
+45000 |	0.2623  |	0.2649  |	12.26
+50000 |	0.2581  |	0.2598  |	12.64
+55000 |	0.2490  |	0.2567  |	12.83
+60000 |	0.2520  |	0.2539  |	13.47
+65000 |	0.2428  |	0.2518  |	13.60
+70000 |	0.2376  |	0.2500  |	13.77
+75000 |	0.2376  |	0.2488  |	13.87
+80000 |	0.2362  |	0.2479  |	**13.96**
+---
+## 💬 Example Translations
+| English | Atlasic Tamazight |
+|----------|------------------|
+| I will go to school. | **Rad ftuɣ s tinml.** |
+| What did you say? | **Mad tnnit?** |
+| I'm not talking to you, I'm talking to him! | **Ur ar gis sawalɣ, ar ak sawalɣ!** |
+| Everyone has a secret face. | **Kraygatt yan ila waḥdut.** |
+---
+Hugging Face Space:
+👉 [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)
+---
+## 🪶 Notes
+- The dataset is **synthetic**, not manually verified.
+- The model performs best on **short and simple general-domain sentences**.
+- Recommended decoding parameters:
+  - `num_beams=5`
+  - `repetition_penalty=1.2–1.5`
+  - `no_repeat_ngram_size=3`
+---
+## 📚 Citation
+If you use this model, please cite:
+```bibtex
+@misc{marian-en-tamazight-2025,
+  title  = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
+  year   = {2025},
+  url    = {https://huggingface.co/ilyasaqit/stage2_marian_opus_synth_model_kaggle3}
+}