MeViS Collection MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation • 2 items • Updated 9 days ago
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7 • 139
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published 26 days ago • 51 • 9
OmniAVS Collection Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation • 3 items • Updated Sep 28