From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted
a
paper
about 10 hours ago
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
upvoted
a
paper
12 days ago
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
updated
a collection
14 days ago
NEO1_5