-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 141
Ron Zhu
RzZ
AI & ML interests
None yet
Organizations
None yet
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 68 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157
Robotic
-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 141
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 68 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157
models
11
RzZ/Qwen2.5-VL-3B-GGUF
3B
•
Updated
•
26
RzZ/Qwen2.5-VL-32B-Instruct-GGUF
0.7B
•
Updated
•
2
RzZ/sd-v1-4-adapter-seg
Updated
•
3
RzZ/sd-v1-4-adapter-depth
Updated
•
5
RzZ/sd-v1-4-adapter-keypose
Updated
•
8
RzZ/sd-v1-4-adapter-color
Updated
•
3
RzZ/sd-v1-4-adapter-canny
Updated
•
2
RzZ/sd-v1-4-adapter-sketch
Updated
•
2
RzZ/sd-v1-4-adapter-openpose
Updated
•
4
RzZ/sd-v1-4-adapter-keypose-depth
Updated
•
3