Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution Paper • 2511.19430 • Published 9 days ago • 7
Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution Paper • 2511.19430 • Published 9 days ago • 7 • 2
SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model Paper • 2306.02245 • Published Jun 4, 2023 • 2
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models Paper • 2408.02034 • Published Aug 4, 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression Paper • 2409.00633 • Published Sep 1, 2024
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition Paper • 2207.11463 • Published Jul 23, 2022
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Paper • 2501.14729 • Published Jan 24 • 3
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception Paper • 2503.13587 • Published Mar 17
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting Paper • 2504.09966 • Published Apr 14
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving Paper • 2505.08725 • Published May 13
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7 • 2
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning Paper • 2410.08114 • Published Oct 10, 2024
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment Paper • 2504.18576 • Published Apr 22
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey Paper • 2502.10498 • Published Feb 14
More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models Paper • 2510.23574 • Published Oct 27
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding Paper • 2510.27481 • Published Oct 31 • 1
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Paper • 2501.14729 • Published Jan 24 • 3