Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.02813

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 129k • 1.82k
Running

Featured

361

Qwen2.5 Omni 7B Demo

🏆

361

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 103k • 1.27k

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Meta-Learning a Dynamical Language Model

Paper • 1803.10631 • Published Mar 28, 2018 • 1
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

Paper • 2003.11963 • Published Mar 26, 2020
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Paper • 2212.04960 • Published Dec 9, 2022 • 1
Continuous Learning in a Hierarchical Multiscale Neural Network

Paper • 1805.05758 • Published May 15, 2018 • 2

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 15
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Paper • 2404.16006 • Published Apr 24, 2024
Running

4.67k

LMArena Leaderboard

🏆

4.67k

Display LMArena Leaderboard

Multimodal Language Model Benchmarks

Multimodal benchmarks that test various aspects of LLMs, VLMs, LMMs

Build error

3

Multimodal Clembench

🏆

3

Explore and compare multimodal models with interactive leaderboards and plots
Running

Featured

85

SEED-Bench Leaderboard

🏆

85

Submit model evaluation results to leaderboard
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 37
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90

Interesting papers

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 57
Physically Grounded Vision-Language Models for Robotic Manipulation

Paper • 2309.02561 • Published Sep 5, 2023 • 9
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 129k • 1.82k
Running

Featured

361

Qwen2.5 Omni 7B Demo

🏆

361

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 103k • 1.27k

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 15
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Paper • 2404.16006 • Published Apr 24, 2024
Running

4.67k

LMArena Leaderboard

🏆

4.67k

Display LMArena Leaderboard

Meta-Learning a Dynamical Language Model

Paper • 1803.10631 • Published Mar 28, 2018 • 1
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

Paper • 2003.11963 • Published Mar 26, 2020
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Paper • 2212.04960 • Published Dec 9, 2022 • 1
Continuous Learning in a Hierarchical Multiscale Neural Network

Paper • 1805.05758 • Published May 15, 2018 • 2

Multimodal Language Model Benchmarks

Multimodal benchmarks that test various aspects of LLMs, VLMs, LMMs

Build error

3

Multimodal Clembench

🏆

3

Explore and compare multimodal models with interactive leaderboards and plots
Running

Featured

85

SEED-Bench Leaderboard

🏆

85

Submit model evaluation results to leaderboard
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 37
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Interesting papers

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 57
Physically Grounded Vision-Language Models for Robotic Manipulation

Paper • 2309.02561 • Published Sep 5, 2023 • 9
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs