Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published 3 days ago • 30
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 10 days ago • 149
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 6 days ago • 124
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 46
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 130
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time Paper • 2512.25075 • Published Dec 31, 2025 • 15
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published Dec 17, 2025 • 20
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published Dec 16, 2025 • 71
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper • 2512.09247 • Published Dec 10, 2025 • 47
Generative Neural Video Compression via Video Diffusion Prior Paper • 2512.05016 • Published Dec 4, 2025 • 10
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published Nov 27, 2025 • 22
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 236
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration Paper • 2509.22570 • Published Sep 26, 2025 • 4
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26, 2025 • 15
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios Paper • 2506.13977 • Published Jun 11, 2025 • 10
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28, 2025 • 11
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Paper • 2504.05594 • Published Apr 8, 2025 • 11
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25, 2025 • 73