PMC-VLM: Medical Visual Question Answering
This model combines PMC-CLIP and PMC-LLaMA for medical visual question answering, specifically fine-tuned on the Kvasir-VQA-x1 dataset.
Model Architecture
- Vision Encoder: PMC-CLIP (frozen)
- Language Model: PMC-LLaMA-7B with LoRA adapters
- Image Projector: Linear projection with 4 soft prompt tokens
- Training: QLoRA (4-bit quantization) fine-tuning
Training Details
- Dataset: Kvasir-VQA-x1 (Medical VQA)
- Learning Rate: 0.0002
- Batch Size: 2 (with 4x accumulation)
- LoRA Rank: 16
- LoRA Alpha: 32
- Epochs: 1
Usage
Load the model and run inference for medical VQA tasks.
Model tree for JustATalentedGuy/PMC-Kvasir-VQA-x1-lora_250918-1352
Base model
chaoyi-wu/PMC_LLAMA_7B