VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 24 days ago • 15
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Paper • 2511.17487 • Published 17 days ago • 9
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published 19 days ago • 42