-
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
Paper • 2507.13353 • Published • 1 -
Kwai Keye-VL Technical Report
Paper • 2507.01949 • Published • 130 -
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
Paper • 2507.11336 • Published • 5 -
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Paper • 1906.02792 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2403.09626
-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 33 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 21
-
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 30 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
Paper • 2507.13353 • Published • 1 -
Kwai Keye-VL Technical Report
Paper • 2507.01949 • Published • 130 -
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
Paper • 2507.11336 • Published • 5 -
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Paper • 1906.02792 • Published
-
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 30 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16
-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 33 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 21
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13