GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published 22 days ago • 44
ConsistEdit: Highly Consistent and Precise Training-free Visual Editing Paper • 2510.17803 • Published 23 days ago • 13
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts Paper • 2509.06155 • Published Sep 7 • 13
Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence Paper • 2508.13139 • Published Aug 18 • 4
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published Aug 12 • 16
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published Aug 12 • 16
Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence Paper • 2508.13139 • Published Aug 18 • 4 • 2
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27 • 36
HumanMM: Global Human Motion Recovery from Multi-shot Videos Paper • 2503.07597 • Published Mar 10 • 2
HumanMM: Global Human Motion Recovery from Multi-shot Videos Paper • 2503.07597 • Published Mar 10 • 2 • 1
view article Article MotionLCM-V2: Improved Compression Rate for Multi-Latent-Token Diffusion By wxDai • Dec 11, 2024 • 17
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 15 • 3