What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models Paper • 2405.15668 • Published May 24, 2024
On Large Multimodal Models as Open-World Image Classifiers Paper • 2503.21851 • Published Mar 27, 2025 • 5
Benchmarking Large Language Models for Image Classification of Marine Mammals Paper • 2410.19848 • Published Oct 22, 2024
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published Jan 14, 2025 • 8
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models Paper • 2408.12808 • Published Aug 23, 2024
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers Paper • 2412.00142 • Published Nov 28, 2024 • 5
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks Paper • 2410.18387 • Published Oct 24, 2024
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2, 2025 • 36