Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Paper • 2510.27684 • Published 11 days ago • 21
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published Nov 22, 2024 • 21
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 25
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28 • 44
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation Paper • 2411.19921 • Published Nov 29, 2024
TokensGen: Harnessing Condensed Tokens for Long Video Generation Paper • 2507.15728 • Published Jul 21 • 7
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior Paper • 2508.00599 • Published Aug 1 • 7
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18 • 34
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation Paper • 2301.07525 • Published Jan 18, 2023
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis Paper • 2308.11473 • Published Aug 22, 2023