GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published 9 days ago • 18
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
DCA: Diversified Co-Attention towards Informative Live Video Commenting Paper • 1911.02739 • Published Nov 7, 2019
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding Paper • 2310.19060 • Published Oct 29, 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models Paper • 2311.17404 • Published Nov 29, 2023 • 1
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding Paper • 2312.02051 • Published Dec 4, 2023 • 1