NeuroVFM: Health system learning achieves generalist neuroimaging models
Preprint / Interactive Demo / GitHub / MLiNS Lab
This is the model card for the NeuroVFM findings generation model, a mulitmodal LLM trained on >270,000 MRI and CT neuroimaging studies (images + expert radiologist reports), finetuned to generate radiological findings for a given input study.
This model includes the visual feature backbone (NeuroVFM), connector, and LLM in one package. For just the feature backbone alone, please see here.
Model Details
- Architecture: 3D Vision Transformer (ViT-Base/4x16x16px) + Open-source LLM (Qwen3-14B) + Perceiver/Resampler Connector
- Training Data: UM-NeuroImages
- Diversity: ~270,000 unique studies (CT & MRI) acquired over 20 years + associated expert radiologist reports
- Training Objective: 2-stage visual instruction tuning (i.e. LLaVA)
- Stage 1 (alignment): 2 epochs, connector trainable only
- Stage 2 (finetuning): 1 epoch, connector + LLM trainable
- Compute Hardware: Trained on 8x NVIDIA L40S GPUs (48GB VRAM)
- Training Efficiency: <500 GPU-hours total pretraining time (Automatic Mixed Precision with PyTorch FSDP)
- Optimization: AdamW, LR of 1.0e-3 (stage 1) and 2.0e-5 (stage 2), Cosine Decay (3% warmup)
Quick Start
The easiest way to use NeuroVFM is through our Python package:
from neurovfm import load_vlm
generator, preproc = load_vlm("mlinslab/neurovfm-llm")
vols = preproc.load_study("/path/to/study/") # study directory with 1+ DICOM/NIfTI files
# clinical_context = "LOC and nausea." # optional clinical context
clinical_context = None
findings = generator.generate(vols, clinical_context)
>>> findings
Detected Study Type: CT HEAD WITHOUT CONTRAST
Findings:
1. Acute subarachnoid hemorrhage centered in the basal cisterns with extension into
the perimesencephalic cisterns and along the tentorial incisura.
2. Hemorrhage and clot abut and partially efface the cerebral aqueduct with prominence
of the temporal horns.
3. Mild mass effect on the midbrain.
Limitations & Safety
This model is a research tool. It has not been approved by the FDA or any regulatory body for clinical use. While trained on a diverse health system population, the model may carry biases intrinsic to the University of Michigan patient cohort. When used for generation, the system may still hallucinate findings, though at a lower rate than pure language models. Outputs must be verified by a clinician.
License
- Weights: CC-BY-NC-SA 4.0 (Non-Commercial Research Use)
- Code: MIT License
- Downloads last month
- 6