Qwen3-4B LoRA Adapter (MedMCQA→KLMVD KD)
Adapter type: LoRA (PEFT)
Base model: Qwen/Qwen3-4B
Task: Medical multiple-choice reasoning (MedMCQA → KD/Distill, Bench=MedQA)
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_id = "Qwen/Qwen3-4B"
adapter_id = "katsukiono/qwen3-4b-medmcqa-klmvd-lora"
tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
# generate
prompt = "Question:\nA 65-year-old woman ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer:"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))
Training summary
- Method: KLMVD (KL-budgeted multi-view decoding) → LoRA distillation
- LoRA target:
o_proj,up_proj,gate_proj,down_proj - Rank: 8 | α: 32 | dropout: 0.05
- Optim: AdamW, lr 1e-4, wd 0.01, cosine
- Trust-region: KL(student||base) ≤ 0.5 (penalized)
- KD loss: token-KD + α_kd * choice-KL (soft over choices)
- Datasets: ["openlifescienceai/MedMCQA", "GBaker/MedQA-USMLE-4-options-hf"]
- Metrics: ["accuracy", "ece", "near-miss"]
Notes
- This repo contains only LoRA adapter weights (license-friendly).
- To publish a merged single model, merge locally and push a separate repo (see merge script).
- Description: KLMVD→LoRA蒸留(MedMCQA) / MedQA評価用の軽量アダプタ