--- base_model: aisingapore/Llama-SEA-LION-v2-8B language: tha license: cc-by-nc-4.0 model_name: airesearch/llama3-8b-cpt-sea-lionv2-base-dolly-th-2.5k-wangchan-instruct-2.5k --- # llama3-8b-cpt-sea-lionv2-base-dolly-th-2.5k-wangchan-instruct-2.5k WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai (EMNLP'25) This repository contains the model artifacts for **llama3-8b-cpt-sea-lionv2-base-dolly-th-2.5k-wangchan-instruct-2.5k** for the paper WangchanThaiInstruct. # Training The model is a aisingapore/Llama-SEA-LION-v2-8B finetuned on 2500 randomly sampled samples of a machine translated [Dolly 15K](https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th) and 2500 randomly samples samples of [WangchanThaiInstruct's training set](https://huggingface.co/datasets/airesearch/WangchanThaiInstruct) using the Llama Factory framework with the following hyperparameters: | Hyperparameter | Value | |-----------------------|-----------| | Learning Rate | 2 × 10⁻⁴ | | Learning Rate Schedule| Cosine | | Batch Size (effective)| 128 | | Max Token Length | 2048 | | Warm up Ratio | 0.1 | | Epochs | 3 | # Evaluation The model was evaluate on [Thai MTBench](https://huggingface.co/datasets/ThaiLLM-Leaderboard/mt-bench-thai) [SeaCrowd's NLU and NLG Thai Split](https://github.com/scb-10x/seacrowd-eval) and [WangchanThaiInstruct's test set](https://huggingface.co/datasets/airesearch/WangchanThaiInstruct) | Model | MT Bench Average | NLU Accuracy (%) | NLG Translation (BLEU) | NLG Generation (RougeL) | WangchanThaiInstruct Fluency | WangchanThaiInstruct Accuracy (%) | WangchanThaiInstruct Rating | |----------------------------------------|------------------|------------------|-------------------------|--------------------------|-------------------------------|----------------------------------|-----------------------------| | **Llama-3.1-8B** | | | | | | | | | Alpaca 5k + WangchanThaiInstruct 5k | 3.00 | 47.22 | 3.12 | 8.59 | 4.08 | 39.84 | 4.16 | | Alpaca 10k | 3.05 | 46.54 | 4.08 | 11.05 | 3.36 | 28.39 | 3.23 | | Alpaca 10k + WangchanThaiInstruct 10k | 3.07 | 46.47 | 2.43 | 8.54 | 4.21 | 42.31 | 4.39 | | Alpaca 20k | 2.75 | 47.31 | 2.79 | 9.14 | 2.77 | 22.32 | 2.94 | | Alpaca 15k + WangchanThaiInstruct 15k | 3.26 | 46.45 | 3.47 | 8.58 | 4.35 | 42.16 | 4.46 | | Alpaca 30k | 2.88 | 47.67 | 3.65 | 9.65 | 2.83 | 21.83 | 2.95 | | Dolly 2.5k + WangchanThaiInstruct 2.5k | 2.40 | 46.43 | 3.75 | 8.72 | 3.57 | 35.93 | 3.72 | | Dolly 5k | 1.88 | 42.87 | 0.95 | 8.55 | 1.75 | 22.70 | 2.19 | | Dolly 5k + WangchanThaiInstruct 5k | 2.28 | 46.43 | 1.36 | 8.55 | 3.85 | 37.89 | 3.98 | | Dolly 10k | 1.99 | 42.41 | 1.35 | 8.64 | 1.69 | 22.35 | 2.14 | | Dolly 7.5k + WangchanThaiInstruct 7.5k | 2.31 | 46.37 | 1.48 | 8.59 | 3.96 | 39.63 | 4.11 | | Dolly 15k | 2.64 | 42.47 | 1.60 | 8.10 | 1.69 | 22.21 | 2.16 | | **Gemma-2-9B** | | | | | | | | | Alpaca 5k + WangchanThaiInstruct 5k | 4.25 | 53.70 | 2.25 | 8.14 | 4.85 | 54.24 | 5.17 | | Alpaca 10k | 3.98 | 51.71 | 1.39 | 6.84 | 4.00 | 46.26 | 4.26 | | Alpaca 10k + WangchanThaiInstruct 10k | 4.02 | 53.81 | 2.02 | 8.09 | 4.97 | 55.33 | 5.30 | | Alpaca 20k | 4.14 | 52.40 | 1.45 | 6.95 | 3.53 | 38.07 | 3.90 | | Alpaca 15k + WangchanThaiInstruct 15k | 4.20 | 53.49 | 1.98 | 8.02 | 5.14 | 56.67 | 5.49 | | Alpaca 30k | 3.79 | 52.41 | 1.25 | 5.73 | 3.25 | 32.71 | 3.43 | | Dolly 2.5k + WangchanThaiInstruct 2.5k | 3.66 | 54.62 | 1.75 | 8.07 | 4.30 | 51.86 | 4.84 | | Dolly 5k | 2.59 | 53.36 | 1.39 | 7.58 | 1.71 | 42.35 | 2.45 | | Dolly 5k + WangchanThaiInstruct 5k | 3.99 | 53.50 | 1.54 | 8.12 | 4.59 | 54.31 | 5.08 | | Dolly 10k | 2.70 | 51.98 | 1.52 | 7.58 | 1.81 | 43.68 | 2.74 | | Dolly 7.5k + WangchanThaiInstruct 7.5k | 4.13 | 53.34 | 1.63 | 8.12 | 4.72 | 55.09 | 5.24 | | Dolly 15k | 4.10 | 51.35 | 1.48 | 7.76 | 3.24 | 40.34 | 2.63 | | **SEA-LIONv2-8B** | | | | | | | | | Alpaca 5k + WangchanThaiInstruct 5k | 4.52 | 43.76 | 34.47 | 19.39 | 5.62 | 52.84 | 5.57 | | Alpaca 10k | 4.54 | 43.31 | 28.01 | 25.35 | 4.61 | 48.88 | 4.73 | | Alpaca 10k + WangchanThaiInstruct 10k | 4.55 | 44.66 | 24.00 | 17.55 | 5.72 | 53.93 | 5.70 | | Alpaca 20k | 4.74 | 43.98 | 24.22 | 25.82 | 4.73 | 49.32 | 4.53 | | Alpaca 15k + WangchanThaiInstruct 15k | 4.44 | 44.51 | 20.58 | 16.31 | 5.54 | 53.94 | 5.61 | | Alpaca 30k | 4.60 | 42.96 | 15.58 | 25.68 | 5.11 | 49.66 | 4.78 | | Dolly 2.5k + WangchanThaiInstruct 2.5k | 4.25 | 44.89 | 36.60 | 26.82 | 5.10 | 50.25 | 5.28 | | Dolly 5k | 3.69 | 45.88 | 19.22 | 35.66 | 3.46 | 48.04 | 4.11 | | Dolly 5k + WangchanThaiInstruct 5k | 4.21 | 44.30 | 15.64 | 23.72 | 5.31 | 51.25 | 5.42 | | Dolly 10k | 3.83 | 46.57 | 14.07 | 37.35 | 4.09 | 46.81 | 4.04 | | Dolly 7.5k + WangchanThaiInstruct 7.5k | 4.31 | 45.31 | 13.54 | 22.00 | 5.54 | 53.81 | 5.57 | | Dolly 15k | 3.57 | 46.14 | 14.31 | 35.37 | 3.24 | 48.13 | 4.15 | # Citation ``` @inproceedings{limkonchotiwat2025thaiinstruct, title = {WangchanThaiInstruct: An Instruction-Following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai}, author = {Limkonchotiwat, Peerat and Tuchinda, Pume and Lowphansirikul, Lalita and Nonesung, Surapon and Tasawong, Panuthep and Aji, Alham Fikri and Udomcharoenchaikit, Can and Nutanong, Sarana}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, year = {2025}, publisher = {Association for Computational Linguistics} } ```