---
library_name: peft
license: apache-2.0
base\_model: unsloth/SmolLM2-360M-Instruct
tags:
- unsloth
- trl
- sft
- generated_from_trainer
model-index:
- name: SmolLM2-360M-Instruct-TaiwanChat
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem)
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem)
# SmolLM2-360M-Instruct-TaiwanChat

This model is a fine-tuned version of [unsloth/SmolLM2-360M-Instruct](https://huggingface.co/unsloth/SmolLM2-360M-Instruct) on the TaiwanChat dataset using Unsloth’s 4-bit quantization and LoRA adapters for efficient instruction-following in Traditional Chinese.

## Installation

```bash
pip install -r requirements.txt
```

## Requirements

* **Python**: 3.8 or higher
* **CUDA**: 11.0 or higher (for GPU support)
* All other dependencies and exact versions are specified in [requirements.txt](requirements.txt).

## Model description

* **Base**: SmolLM2-360M-Instruct (360M parameters)
* **Quantization**: 4-bit weight quantization (activations in full precision)
* **Adapters**: LoRA with rank `r=16`, alpha `α=16`, dropout `0.0`, applied to projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`) citeturn2file0
* **Dataset**: TaiwanChat (`yentinglin/TaiwanChat`) — 600 k filtered examples, max length 512, streamed and deduplicated, then split 90% train / 10% validation citeturn2file0

## Intended uses & limitations

**Intended uses:**

* Conversational AI and chatbots handling Traditional Chinese queries (e.g., weather, FAQs).
* Instruction-following in a dialogue format.

**Limitations:**

* Limited capacity may cause occasional hallucinations or vague answers.
* Performance measured on a 10% hold-out; real-world data discrepancies may impact quality.
* Quantization and adapter-based tuning trade off some accuracy for efficiency.

## Training procedure

1. **Data preparation**

   * Streamed 600 k examples from HF dataset, filtered to `max_len=512`, cleaned assistant markers via regex, then shuffled and split with `Dataset.train_test_split(test_size=0.1)` citeturn2file0

2. **Model & training setup**

   * Loaded base with `FastLanguageModel.from_pretrained(..., load_in_4bit=True, full_finetuning=False)`
   * Applied LoRA adapters via `FastLanguageModel.get_peft_model(...)`
   * Used `LoggingSFTTrainer` subclass to catch empty-label and NaN-loss cases during eval citeturn2file0

3. **Hyperparameters**

   | Parameter                        |              Value |
   | -------------------------------- | -----------------: |
   | `num_train_epochs`               |                  3 |
   | `per_device_train_batch_size`    |                 40 |
   | `gradient_accumulation_steps`    |                  1 |
   | `per_device_eval_batch_size`     |                  1 |
   | `learning_rate`                  |               2e-4 |
   | `weight_decay`                   |               0.01 |
   | `warmup_steps`                   |                500 |
   | `max_seq_length`                 |                512 |
   | `evaluation_strategy`            |  steps (every 100) |
   | `eval_steps`                     |                100 |
   | `save_strategy`                  | steps (every 1000) |
   | `logging_steps`                  |                 50 |
   | `optimizer`                      |        adamw\_8bit |
   | `gradient_checkpointing`         |              false |
   | `seed`                           |               3407 |
   | `EarlyStoppingCallback patience` |            4 evals |

4. **Training & push**

   * Ran `trainer.train()`, merged LoRA weights, then pushed the merged 16-bit model to `Luigi/SmolLM2-360M-Instruct-TaiwanChat` on Hugging Face via `model.push_to_hub_merged()` citeturn2file0

## Example inference

```python
from transformers import AutoTokenizer
from peft import PeftModel

# Load merged model
tokenizer = AutoTokenizer.from_pretrained("Luigi/SmolLM2-360M-Instruct-TaiwanChat")
model = PeftModel.from_pretrained(
    "Luigi/SmolLM2-360M-Instruct-TaiwanChat",
    torch_dtype=torch.float16,
).eval().to("cuda")

# Query
test_prompt = "請問台北今天的天氣如何？"
inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.8,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Framework versions

```text
bitsandbytes==0.45.5
datasets==3.2.0
hatchet==1.4.0
importlib_metadata==8.6.1
lit==18.1.8
matplotlib
numpy
packaging
pandas
psutil==6.1.1
pybind11==2.13.6
pytest==8.1.1
redis==6.0.0
scipy
setuptools==70.3.0
Sphinx
sphinx_gallery
sphinx_rtd_theme
tabulate==0.9.0
torch==2.7.0
transformers==4.47.1
trl==0.15.2
unsloth==2025.4.1
unsloth_zoo==2025.4.2
cut_cross_entropy
wandb
wheel==0.45.1
```