-
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
Collections
Discover the best community collections!
Collections including paper arxiv:2401.13601
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper • 2405.09215 • Published • 22 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 14
-
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Paper • 2502.09620 • Published • 26 -
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper • 2402.13251 • Published • 15
-
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Paper • 2502.09620 • Published • 26 -
The Evolution of Multimodal Model Architectures
Paper • 2405.17927 • Published • 1 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Efficient Architectures for High Resolution Vision-Language Models
Paper • 2501.02584 • Published
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper • 2405.09215 • Published • 22 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 14
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper • 2402.13251 • Published • 15