StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published Sep 1 • 50
Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs Paper • 2506.07045 • Published Jun 8 • 8
Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs Paper • 2506.07045 • Published Jun 8 • 8 • 2
Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs Paper • 2506.07045 • Published Jun 8 • 8
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 74
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 141
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated 11 days ago • 76