Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published Jul 3, 2025 • 8
Generative Blocks World: Moving Things Around in Pictures Paper • 2506.20703 • Published Jun 25, 2025 • 5
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes Paper • 2505.23179 • Published May 29, 2025 • 1
ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding Paper • 2506.01274 • Published Jun 2, 2025 • 3
Mimetic Initialization Helps State Space Models Learn to Recall Paper • 2410.11135 • Published Oct 14, 2024 • 1
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22, 2025 • 89
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 9
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 11