JustATalentedGuy
/

PMC-Kvasir-VQA-x1-lora_250918-1352

Visual Question Answering

Model card Files Files and versions

PMC-VLM: Medical Visual Question Answering

This model combines PMC-CLIP and PMC-LLaMA for medical visual question answering, specifically fine-tuned on the Kvasir-VQA-x1 dataset.

Model Architecture

Vision Encoder: PMC-CLIP (frozen)
Language Model: PMC-LLaMA-7B with LoRA adapters
Image Projector: Linear projection with 4 soft prompt tokens
Training: QLoRA (4-bit quantization) fine-tuning

Training Details

Dataset: Kvasir-VQA-x1 (Medical VQA)
Learning Rate: 0.0002
Batch Size: 2 (with 4x accumulation)
LoRA Rank: 16
LoRA Alpha: 32
Epochs: 1

Usage

Load the model and run inference for medical VQA tasks.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for JustATalentedGuy/PMC-Kvasir-VQA-x1-lora_250918-1352

Base model

chaoyi-wu/PMC_LLAMA_7B

Finetuned

(1)

this model

Dataset used to train JustATalentedGuy/PMC-Kvasir-VQA-x1-lora_250918-1352