PMC-VLM: Medical Visual Question Answering

This model combines PMC-CLIP and PMC-LLaMA for medical visual question answering, specifically fine-tuned on the Kvasir-VQA-x1 dataset.

Model Architecture

  • Vision Encoder: PMC-CLIP (frozen)
  • Language Model: PMC-LLaMA-7B with LoRA adapters
  • Image Projector: Linear projection with 4 soft prompt tokens
  • Training: QLoRA (4-bit quantization) fine-tuning

Training Details

  • Dataset: Kvasir-VQA-x1 (Medical VQA)
  • Learning Rate: 0.0002
  • Batch Size: 2 (with 4x accumulation)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Epochs: 1

Usage

Load the model and run inference for medical VQA tasks.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JustATalentedGuy/PMC-Kvasir-VQA-x1-lora_250918-1352

Finetuned
(1)
this model

Dataset used to train JustATalentedGuy/PMC-Kvasir-VQA-x1-lora_250918-1352