Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 7 items β’ Updated 26 days ago β’ 162
Running Featured 559 Vision Arena (Testing VLMs side-by-side) πΌ 559 Display image analysis results
Running Featured 130 Open VLM Video Leaderboard π 130 VLMEvalKit Eval Results in video understanding benchmark