RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models Paper • 2404.04929 • Published Apr 7, 2024
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL Paper • 2406.05427 • Published Jun 8, 2024
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds Paper • 2502.20041 • Published Feb 27
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization Paper • 2506.03863 • Published Jun 4
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8 • 31 • 2
Hume: Introducing System-2 Thinking in Visual-Language-Action Model Paper • 2505.21432 • Published May 27 • 4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8 • 31
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8 • 31
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 47