OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published 1 day ago • 13
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG Paper • 2510.03663 • Published Oct 4 • 15
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29 • 139
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression Paper • 2508.04979 • Published Aug 7 • 5
CoAct-1: Computer-using Agents with Coding as Actions Paper • 2508.03923 • Published Aug 5 • 14
TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs Paper • 2507.21584 • Published Jul 29 • 10
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Paper • 2507.20198 • Published Jul 27 • 26
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published Jul 7 • 16
HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper • 2505.21334 • Published May 27 • 21
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 98
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper • 2504.17040 • Published Apr 23 • 13
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20 • 25
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 75
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 151
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 66
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 89