Qwen3-4B LoRA Adapter (MedMCQA→KLMVD KD)

Adapter type: LoRA (PEFT)
Base model: Qwen/Qwen3-4B
Task: Medical multiple-choice reasoning (MedMCQA → KD/Distill, Bench=MedQA)

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-4B"
adapter_id = "katsukiono/qwen3-4b-medmcqa-klmvd-lora"

tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
# generate
prompt = "Question:\nA 65-year-old woman ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer:"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

Training summary

  • Method: KLMVD (KL-budgeted multi-view decoding) → LoRA distillation
  • LoRA target: o_proj, up_proj, gate_proj, down_proj
  • Rank: 8 | α: 32 | dropout: 0.05
  • Optim: AdamW, lr 1e-4, wd 0.01, cosine
  • Trust-region: KL(student||base) ≤ 0.5 (penalized)
  • KD loss: token-KD + α_kd * choice-KL (soft over choices)
  • Datasets: ["openlifescienceai/MedMCQA", "GBaker/MedQA-USMLE-4-options-hf"]
  • Metrics: ["accuracy", "ece", "near-miss"]

Notes

  • This repo contains only LoRA adapter weights (license-friendly).
  • To publish a merged single model, merge locally and push a separate repo (see merge script).
  • Description: KLMVD→LoRA蒸留(MedMCQA) / MedQA評価用の軽量アダプタ
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for katsukiono/qwen3-4b-medmcqa-klmvd-lora

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Adapter
(92)
this model

Dataset used to train katsukiono/qwen3-4b-medmcqa-klmvd-lora