Qwen3-4B Medical QA
This is a fully merged medical question-answering model based on Qwen/Qwen3-4B-Instruct-2507. The model has been fine-tuned using LoRA (with DoRA) on medical QA datasets and then merged into a single standalone model.
Model Details
Base Model
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Model Type: Causal Language Model (fully merged)
- Parameters: 4.02B
- Architecture: Qwen3ForCausalLM
- Precision: BFloat16
- Context Length: 262,144 tokens
- License: Same as base model
Fine-tuning Details
- Method: LoRA with DoRA (Weight-Decomposed Low-Rank Adaptation)
- LoRA Rank: 64
- LoRA Alpha: 64
- LoRA Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Training Framework: LLaMA-Factory
Performance
- Validation Accuracy: 82.18%
- Validation Loss: 0.7984
- Training Dataset: combined_selected_train (medical QA)
Training Details
Training Hyperparameters
- Learning Rate: 3e-4
- LR Scheduler: constant_with_warmup
- Optimizer: AdamW (fused)
- Number of Epochs: 3.0
- Total Batch Size: 48 (distributed across 48 GPUs)
- Per-device Train Batch Size: 1
- Per-device Eval Batch Size: 8
- Seed: 42
Training Results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 1.0377 | 1.18 | 20 | 1.1270 | 0.7382 |
| 0.6478 | 2.35 | 40 | 0.8764 | 0.7388 |
| - | 3.00 | 51 | 0.7984 | 0.8218 |
Framework Versions
- PEFT: 0.15.2
- Transformers: 4.55.0
- PyTorch: 2.8.0+cu128
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Usage
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Acryl-Jonathan-01/qwen3-4b-medical-qa-merged",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Acryl-Jonathan-01/qwen3-4b-medical-qa-merged",
trust_remote_code=True
)
# Prepare messages
messages = [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What are the common symptoms of pneumonia?"}
]
# Generate response
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.8,
top_k=20
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with vLLM (Faster Inference)
from vllm import LLM, SamplingParams
# Initialize vLLM
llm = LLM(
model="Acryl-Jonathan-01/qwen3-4b-medical-qa-merged",
trust_remote_code=True,
dtype="bfloat16"
)
# Sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.8,
top_k=20,
max_tokens=512
)
# Generate
prompts = ["What are the symptoms of diabetes?"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
Using with Ollama
A Modelfile is included for easy deployment with Ollama:
# Create Ollama model
ollama create qwen3-medical -f Modelfile
# Run the model
ollama run qwen3-medical "What are the symptoms of hypertension?"
Quantization (Optional)
For resource-constrained environments, you can quantize the model:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype="bfloat16",
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
"Acryl-Jonathan-01/qwen3-4b-medical-qa-merged",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
Intended Use
This model is designed for:
- Medical question-answering in educational contexts
- Medical knowledge exploration and learning
- Research in medical AI and NLP applications
- Prototype development for medical chatbots and assistants
Limitations & Warnings
IMPORTANT DISCLAIMERS:
- โ ๏ธ NOT for clinical use: This model should NEVER be used for actual clinical decision-making, diagnosis, or treatment without qualified medical professional oversight
- โ ๏ธ Educational purposes only: Intended for education, research, and development purposes
- โ ๏ธ May contain errors: The model can generate incorrect or outdated medical information
- โ ๏ธ Bias: May inherit biases from training data
- โ ๏ธ Hallucination: Like all LLMs, may generate plausible-sounding but incorrect information
- โ ๏ธ Not a replacement: Always consult qualified healthcare professionals for medical advice
Performance Limitations
- Performance may vary on out-of-distribution medical questions
- Better suited for common medical topics in the training distribution
- May struggle with very recent medical developments (knowledge cutoff)
- Accuracy is dataset-dependent and not guaranteed
Model Size & Requirements
- Model Size: ~7.6GB (BFloat16)
- Recommended VRAM: 16GB+ for inference
- Quantized (4-bit): ~2-3GB VRAM
- CPU Inference: Possible but slow
Files Structure
qwen3-4b-medical-qa-merged/
โโโ model-00001-of-00005.safetensors # Model weights (shard 1)
โโโ model-00002-of-00005.safetensors # Model weights (shard 2)
โโโ model-00003-of-00005.safetensors # Model weights (shard 3)
โโโ model-00004-of-00005.safetensors # Model weights (shard 4)
โโโ model-00005-of-00005.safetensors # Model weights (shard 5)
โโโ model.safetensors.index.json # Shard index
โโโ config.json # Model configuration
โโโ generation_config.json # Generation settings
โโโ tokenizer.json # Tokenizer
โโโ tokenizer_config.json # Tokenizer config
โโโ vocab.json # Vocabulary
โโโ merges.txt # BPE merges
โโโ chat_template.jinja # Chat template
โโโ special_tokens_map.json # Special tokens
โโโ added_tokens.json # Added tokens
โโโ Modelfile # Ollama modelfile
โโโ README.md # This file
Comparison: LoRA Adapter vs Merged Model
Advantages of merged model:
- โ Easier to use (no need to load adapter separately)
- โ Faster inference (no adapter overhead)
- โ Compatible with more inference engines (vLLM, Ollama, etc.)
- โ Can be quantized directly
Disadvantages:
- โ Larger file size (~7.6GB vs ~182MB for adapter)
- โ Less flexible (can't swap adapters easily)
- โ Takes more storage space
Citation
If you use this model in your research or applications, please cite:
@misc{qwen3-4b-medical-qa-merged,
author = {Your Name},
title = {Qwen3-4B Medical QA},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Acryl-Jonathan-01/qwen3-4b-medical-qa-merged}}
}
Also consider citing the base model:
@misc{qwen3-2025,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
publisher={Alibaba Cloud},
url={https://huggingface.co/Qwen}
}
Acknowledgments
- Base Model: Qwen Team
- Training Framework: LLaMA-Factory
- PEFT Library: Hugging Face PEFT
- Transformers Library: Hugging Face Transformers
License
This model inherits the license from the base model Qwen/Qwen3-4B-Instruct-2507. Please refer to the base model's license for usage terms and conditions.
Contact & Support
For issues, questions, or contributions:
- Open an issue on the model repository
- Refer to LLaMA-Factory documentation for training questions
- Downloads last month
- 14
Model tree for Acryl-Jonathan-01/ALLM-Med-4B
Base model
Qwen/Qwen3-4B-Instruct-2507Evaluation results
- Accuracyself-reported0.822
- Lossself-reported0.798