Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2503.20314

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Video Generation

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9 • 4

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 145
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17 • 43
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published Mar 21 • 27
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published Mar 20 • 14

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 146
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 4
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 149

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Paper • 2503.21765 • Published Mar 27 • 11
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33
Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Paper • 2502.06782 • Published Feb 10 • 14

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Video Generation

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9 • 4

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Paper • 2503.21765 • Published Mar 27 • 11
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33
Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published Mar 26 • 55
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Paper • 2502.06782 • Published Feb 10 • 14

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 145
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17 • 43
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published Mar 21 • 27
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published Mar 20 • 14

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 146
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 4
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 149

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs