Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper โข 2510.23607 โข Published Oct 27 โข 177
DreamOmni2: Multimodal Instruction-based Editing and Generation Paper โข 2510.06679 โข Published Oct 8 โข 73
DreamOmni: Unified Image Generation and Editing Paper โข 2412.17098 โข Published Dec 22, 2024 โข 2
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech Paper โข 2509.25131 โข Published Sep 29 โข 15
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper โข 2412.09501 โข Published Dec 12, 2024 โข 48
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper โข 2403.18814 โข Published Mar 27, 2024 โข 47
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Paper โข 2311.17043 โข Published Nov 28, 2023