Qwen3-4B LoRA Adapter (MedMCQA→KLMVD KD)

Adapter type: LoRA (PEFT)
Base model: Qwen/Qwen3-4B
Task: Medical multiple-choice reasoning (MedMCQA → KD/Distill, Bench=MedQA)

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-4B"
adapter_id = "katsukiono/qwen3-4b-medmcqa-klmvd-lora"

tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
# generate
prompt = "Question:\nA 65-year-old woman ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer:"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

Training summary

Method: KLMVD (KL-budgeted multi-view decoding) → LoRA distillation
LoRA target: o_proj, up_proj, gate_proj, down_proj
Rank: 8 | α: 32 | dropout: 0.05
Optim: AdamW, lr 1e-4, wd 0.01, cosine
Trust-region: KL(student||base) ≤ 0.5 (penalized)
KD loss: token-KD + α_kd * choice-KL (soft over choices)
Datasets: ["openlifescienceai/MedMCQA", "GBaker/MedQA-USMLE-4-options-hf"]
Metrics: ["accuracy", "ece", "near-miss"]

Notes

This repo contains only LoRA adapter weights (license-friendly).
To publish a merged single model, merge locally and push a separate repo (see merge script).
Description: KLMVD→LoRA蒸留(MedMCQA) / MedQA評価用の軽量アダプタ

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for katsukiono/qwen3-4b-medmcqa-klmvd-lora

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Adapter

(92)

this model

katsukiono
/

qwen3-4b-medmcqa-klmvd-lora

Qwen3-4B LoRA Adapter (MedMCQA→KLMVD KD)

How to use

Training summary

Notes

Model tree for katsukiono/qwen3-4b-medmcqa-klmvd-lora

Dataset used to train katsukiono/qwen3-4b-medmcqa-klmvd-lora