Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Zhang124 's Collections
image Transformer
Multimodal Image Classification

Multimodal Image Classification

updated 3 days ago
Upvote
-

  • What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

    Paper • 2405.15668 • Published May 24, 2024

  • On Large Multimodal Models as Open-World Image Classifiers

    Paper • 2503.21851 • Published Mar 27, 2025 • 5

  • Benchmarking Large Language Models for Image Classification of Marine Mammals

    Paper • 2410.19848 • Published Oct 22, 2024

  • Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

    Paper • 2501.07783 • Published Jan 14, 2025 • 8

  • VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models

    Paper • 2408.12808 • Published Aug 23, 2024

  • Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

    Paper • 2412.00142 • Published Nov 28, 2024 • 5

  • Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

    Paper • 2410.18387 • Published Oct 24, 2024

  • How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

    Paper • 2507.01955 • Published Jul 2, 2025 • 36
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs