Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30 • 80
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published May 21 • 20
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 43