ilyasaqit commited on
Commit
6de4d9c
·
verified ·
1 Parent(s): e01210b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - tzm
5
+ - shi
6
+ - zgh
7
+ tags:
8
+ - translation
9
+ - marian
10
+ - tamazight
11
+ - tachelhit
12
+ - central-atlas
13
+ license: mit
14
+ datasets:
15
+ - synthetic
16
+ metrics:
17
+ - bleu
18
+ base_model:
19
+ - Helsinki-NLP/opus-mt-en-ber
20
+ ---
21
+
22
+ # 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)
23
+
24
+ This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English → Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**).
25
+
26
+ ---
27
+
28
+ ## 📘 Model Overview
29
+
30
+ | Property | Description |
31
+ |-----------|-------------|
32
+ | **Base Model** | `Helsinki-NLP/opus-mt-en-ber` |
33
+ | **Architecture** | MarianMT |
34
+ | **Languages** | English → Tamazight (Tachelhit / Central Atlas Tamazight) |
35
+ | **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora |
36
+ | **Training Objective** | Sequence-to-sequence translation fine-tuning |
37
+ | **Framework** | 🤗 Transformers |
38
+ | **Tokenizer** | SentencePiece |
39
+
40
+ ---
41
+
42
+ ## 🧠 Training Details
43
+
44
+ | Hyperparameter | Value |
45
+ |----------------|--------|
46
+ | `per_device_train_batch_size` | 16 |
47
+ | `per_device_eval_batch_size` | 48 |
48
+ | `learning_rate` | 2e-5 |
49
+ | `num_train_epochs` | 8 |
50
+ | `max_length` | 128 |
51
+ | `num_beams` | 5 |
52
+ | `eval_steps` | 5000 |
53
+ | `save_steps` | 5000 |
54
+ | `generation_no_repeat_ngram_size` | 3 |
55
+ | `generation_repetition_penalty` | 1.5 |
56
+
57
+ **Training Environment:**
58
+ - 1 × NVIDIA **P100 (16 GB)** on **Kaggle**
59
+ - Total training time: **6 h 33 m 60 s**
60
+
61
+ ---
62
+
63
+ ## 📈 Evaluation Results
64
+
65
+ | Step | Train Loss | Val Loss | BLEU |
66
+ |------|-------------|-----------|------|
67
+ 5000 | 0.4258 | 0.4082 | 2.01
68
+ 10000 | 0.3694 | 0.3511 | 6.09
69
+ 15000 | 0.3419 | 0.3232 | 7.83
70
+ 20000 | 0.3148 | 0.3054 | 8.44
71
+ 25000 | 0.2965 | 0.2923 | 9.79
72
+ 30000 | 0.2895 | 0.2824 | 10.19
73
+ 35000 | 0.2755 | 0.2756 | 11.26
74
+ 40000 | 0.2733 | 0.2691 | 11.75
75
+ 45000 | 0.2623 | 0.2649 | 12.26
76
+ 50000 | 0.2581 | 0.2598 | 12.64
77
+ 55000 | 0.2490 | 0.2567 | 12.83
78
+ 60000 | 0.2520 | 0.2539 | 13.47
79
+ 65000 | 0.2428 | 0.2518 | 13.60
80
+ 70000 | 0.2376 | 0.2500 | 13.77
81
+ 75000 | 0.2376 | 0.2488 | 13.87
82
+ 80000 | 0.2362 | 0.2479 | **13.96**
83
+
84
+ ---
85
+
86
+ ## 💬 Example Translations
87
+
88
+ | English | Atlasic Tamazight |
89
+ |----------|------------------|
90
+ | I will go to school. | **Rad ftuɣ s tinml.** |
91
+ | What did you say? | **Mad tnnit?** |
92
+ | I'm not talking to you, I'm talking to him! | **Ur ar gis sawalɣ, ar ak sawalɣ!** |
93
+ | Everyone has a secret face. | **Kraygatt yan ila waḥdut.** |
94
+
95
+ ---
96
+
97
+ Hugging Face Space:
98
+ 👉 [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)
99
+
100
+ ---
101
+
102
+ ## 🪶 Notes
103
+
104
+ - The dataset is **synthetic**, not manually verified.
105
+ - The model performs best on **short and simple general-domain sentences**.
106
+ - Recommended decoding parameters:
107
+ - `num_beams=5`
108
+ - `repetition_penalty=1.2–1.5`
109
+ - `no_repeat_ngram_size=3`
110
+ ---
111
+
112
+ ## 📚 Citation
113
+
114
+ If you use this model, please cite:
115
+
116
+ ```bibtex
117
+ @misc{marian-en-tamazight-2025,
118
+ title = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
119
+ year = {2025},
120
+ url = {https://huggingface.co/ilyasaqit/stage2_marian_opus_synth_model_kaggle3}
121
+ }