SwarmMedQA-7B-v1
Medical QA model fine-tuned on clinical Chain-of-Thought reasoning data.
Built by Swarm & Bee (S&B) on the SwarmOS platform.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Method | LoRA (r=64, alpha=128, all attn+MLP) |
| Training data | SwarmOS/SwarmMedQA v0.1.0 (124 gold examples) |
| Epochs | 4 |
| Training time | 2 min 22 sec (2x RTX 3090 Ti) |
| Final loss | 0.66 |
| Token accuracy | 82.4% |
| Trainable params | 161M / 7.8B (2.1%) |
Benchmark Results
| Benchmark | Base Qwen2.5-7B | SwarmMedQA-7B-v1 | Delta |
|---|---|---|---|
| MedQA (USMLE 4-option) | 62.0% | 66.0% | +4.0% |
| PubMedQA (abstract grounding) | 0.0%* | 53.0% | +53.0% |
| Internal benchmark (hard/expert) | 88.9% | 77.8% | -11.1%** |
* Base model gives verbose answers that fail the yes/no/maybe parser. The fine-tuned model learned format adherence.
** Small sample (9 examples). Internal dip is mild overfitting on 124 training examples over 4 epochs — expected to resolve with more data.
Training Configuration
# LoRA
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
# Training
learning_rate: 2e-5
lr_scheduler: cosine
warmup_ratio: 0.1
weight_decay: 0.01
epochs: 4
batch_size: 2
gradient_accumulation: 8
effective_batch_size: 16
fp16: true
gradient_checkpointing: true
Data Pipeline
Every training example passes through a 3-stage automated quality gate:
- Verification - Fact-checked against medical literature (factuality score 1-10)
- Scoring - Evaluated for clinical relevance, reasoning depth, educational value
- Safety Check - Screened for patient harm potential
Gold criteria: factuality >= 9 AND reasoning_depth >= 8 AND not rejected AND risk != critical
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"SwarmOS/SwarmMedQA-7B-v1",
dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SwarmOS/SwarmMedQA-7B-v1")
prompt = """### Instruction:
You are a board-certified physician. Think step by step and explain your clinical reasoning.
### Input:
A 65-year-old male presents with sudden onset crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. ECG shows ST-elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Specialties Covered
Cardiology, Clinical Reasoning, Emergency Medicine, Endocrinology, General Surgery, Gynecology, Neurology, Obstetrics, Oncology, Pediatrics, Pharmacology, Psychiatry
Limitations
- Trained on 124 gold examples — early-stage model, not for clinical use
- English only
- Mild overfitting on hardest examples (4 epochs on small dataset)
- PubMedQA improvement is partially format adherence, not just knowledge gain
- Not a substitute for professional medical advice
Citation
@model{swarmos_swarmmedqa_7b_v1,
title={SwarmMedQA-7B-v1: Clinical-Grade Medical QA with Chain-of-Thought},
author={Swarm and Bee},
year={2026},
base_model={Qwen/Qwen2.5-7B-Instruct},
dataset={SwarmOS/SwarmMedQA},
url={https://huggingface.co/SwarmOS/SwarmMedQA-7B-v1}
}
License
Apache 2.0
Built with the Dark Box Engine. We compute intelligence.
- Downloads last month
- 12
Model tree for SwarmOS/SwarmMedQA-7B-v1
Dataset used to train SwarmOS/SwarmMedQA-7B-v1
Evaluation results
- Accuracy on MedQA (USMLE)self-reported66.000
- Accuracy on PubMedQAself-reported53.000