--- library_name: peft license: apache-2.0 base\_model: unsloth/SmolLM2-360M-Instruct tags: - unsloth - trl - sft - generated_from_trainer model-index: - name: SmolLM2-360M-Instruct-TaiwanChat results: [] --- [Visualize in Weights & Biases](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem) [Visualize in Weights & Biases](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem) # SmolLM2-360M-Instruct-TaiwanChat This model is a fine-tuned version of [unsloth/SmolLM2-360M-Instruct](https://huggingface.co/unsloth/SmolLM2-360M-Instruct) on the TaiwanChat dataset using Unsloth’s 4-bit quantization and LoRA adapters for efficient instruction-following in Traditional Chinese. ## Installation ```bash pip install -r requirements.txt ``` ## Requirements * **Python**: 3.8 or higher * **CUDA**: 11.0 or higher (for GPU support) * All other dependencies and exact versions are specified in [requirements.txt](requirements.txt). ## Model description * **Base**: SmolLM2-360M-Instruct (360M parameters) * **Quantization**: 4-bit weight quantization (activations in full precision) * **Adapters**: LoRA with rank `r=16`, alpha `α=16`, dropout `0.0`, applied to projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`) citeturn2file0 * **Dataset**: TaiwanChat (`yentinglin/TaiwanChat`) — 600 k filtered examples, max length 512, streamed and deduplicated, then split 90% train / 10% validation citeturn2file0 ## Intended uses & limitations **Intended uses:** * Conversational AI and chatbots handling Traditional Chinese queries (e.g., weather, FAQs). * Instruction-following in a dialogue format. **Limitations:** * Limited capacity may cause occasional hallucinations or vague answers. * Performance measured on a 10% hold-out; real-world data discrepancies may impact quality. * Quantization and adapter-based tuning trade off some accuracy for efficiency. ## Training procedure 1. **Data preparation** * Streamed 600 k examples from HF dataset, filtered to `max_len=512`, cleaned assistant markers via regex, then shuffled and split with `Dataset.train_test_split(test_size=0.1)` citeturn2file0 2. **Model & training setup** * Loaded base with `FastLanguageModel.from_pretrained(..., load_in_4bit=True, full_finetuning=False)` * Applied LoRA adapters via `FastLanguageModel.get_peft_model(...)` * Used `LoggingSFTTrainer` subclass to catch empty-label and NaN-loss cases during eval citeturn2file0 3. **Hyperparameters** | Parameter | Value | | -------------------------------- | -----------------: | | `num_train_epochs` | 3 | | `per_device_train_batch_size` | 40 | | `gradient_accumulation_steps` | 1 | | `per_device_eval_batch_size` | 1 | | `learning_rate` | 2e-4 | | `weight_decay` | 0.01 | | `warmup_steps` | 500 | | `max_seq_length` | 512 | | `evaluation_strategy` | steps (every 100) | | `eval_steps` | 100 | | `save_strategy` | steps (every 1000) | | `logging_steps` | 50 | | `optimizer` | adamw\_8bit | | `gradient_checkpointing` | false | | `seed` | 3407 | | `EarlyStoppingCallback patience` | 4 evals | 4. **Training & push** * Ran `trainer.train()`, merged LoRA weights, then pushed the merged 16-bit model to `Luigi/SmolLM2-360M-Instruct-TaiwanChat` on Hugging Face via `model.push_to_hub_merged()` citeturn2file0 ## Example inference ```python from transformers import AutoTokenizer from peft import PeftModel # Load merged model tokenizer = AutoTokenizer.from_pretrained("Luigi/SmolLM2-360M-Instruct-TaiwanChat") model = PeftModel.from_pretrained( "Luigi/SmolLM2-360M-Instruct-TaiwanChat", torch_dtype=torch.float16, ).eval().to("cuda") # Query test_prompt = "請問台北今天的天氣如何?" inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=100, do_sample=True, temperature=0.8, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Framework versions ```text bitsandbytes==0.45.5 datasets==3.2.0 hatchet==1.4.0 importlib_metadata==8.6.1 lit==18.1.8 matplotlib numpy packaging pandas psutil==6.1.1 pybind11==2.13.6 pytest==8.1.1 redis==6.0.0 scipy setuptools==70.3.0 Sphinx sphinx_gallery sphinx_rtd_theme tabulate==0.9.0 torch==2.7.0 transformers==4.47.1 trl==0.15.2 unsloth==2025.4.1 unsloth_zoo==2025.4.2 cut_cross_entropy wandb wheel==0.45.1 ```