RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging Paper • 2510.20479 • Published 27 days ago • 10
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 27 days ago • 44
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published Oct 16 • 47
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published 26 days ago • 96
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published 23 days ago • 22
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published 23 days ago • 26
ACG: Action Coherence Guidance for Flow-based VLA models Paper • 2510.22201 • Published 25 days ago • 36
Rethinking Visual Intelligence: Insights from Video Pretraining Paper • 2510.24448 • Published 22 days ago • 5
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Paper • 2510.17439 • Published about 1 month ago • 25
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published 23 days ago • 53
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning Paper • 2510.23473 • Published 23 days ago • 82
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 20 days ago • 32
Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published Oct 17 • 39