Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.02542

OpenGVLab/InternVL3_5-4B-HF

Image-Text-to-Text • 5B • Updated Sep 8 • 1.28k • 3
Wan-AI/Wan2.2-TI2V-5B

Text-to-Video • Updated Aug 7 • 3.9k • • 443
Thinking Augmented Pre-training

Paper • 2509.20186 • Published Sep 24 • 23
stabilityai/stable-video-diffusion-img2vid-xt-1-1

Image-to-Video • Updated Jul 10, 2024 • 29.7k • 954

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 10
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 66
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

One Shot, One Talk: Whole-body Talking Avatar from a Single Image

Paper • 2412.01106 • Published Dec 2, 2024 • 24
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10
IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Paper • 2412.14963 • Published Dec 19, 2024 • 6
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 222

Running

1.21k

1.21k

Video Face Swap

👱

Video deep fake
Running on L40S

1.57k

1.57k

Expression Editor

🐨

Quickly edit the expression of a face
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9 • 4

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 145
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17 • 43
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published Mar 21 • 27
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published Mar 20 • 14

Face Generation-Swap-Contol-Edit

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Paper • 2412.11279 • Published Dec 15, 2024 • 13
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

Paper • 2501.02260 • Published Jan 4 • 5
GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

Paper • 2501.09978 • Published Jan 17 • 6
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

Paper • 2502.13995 • Published Feb 19 • 9

Research papers

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Multi-Token Attention

Paper • 2504.00927 • Published Apr 1 • 55
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

Diffusion Models

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 32
ODIN: A Single Model for 2D and 3D Perception

Paper • 2401.02416 • Published Jan 4, 2024 • 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13

OpenGVLab/InternVL3_5-4B-HF

Image-Text-to-Text • 5B • Updated Sep 8 • 1.28k • 3
Wan-AI/Wan2.2-TI2V-5B

Text-to-Video • Updated Aug 7 • 3.9k • • 443
Thinking Augmented Pre-training

Paper • 2509.20186 • Published Sep 24 • 23
stabilityai/stable-video-diffusion-img2vid-xt-1-1

Image-to-Video • Updated Jul 10, 2024 • 29.7k • 954

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9 • 4

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 145
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published Mar 17 • 43
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published Mar 21 • 27
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published Mar 20 • 14

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 10
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 66
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

Face Generation-Swap-Contol-Edit

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Paper • 2412.11279 • Published Dec 15, 2024 • 13
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

Paper • 2501.02260 • Published Jan 4 • 5
GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

Paper • 2501.09978 • Published Jan 17 • 6
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

Paper • 2502.13995 • Published Feb 19 • 9

One Shot, One Talk: Whole-body Talking Avatar from a Single Image

Paper • 2412.01106 • Published Dec 2, 2024 • 24
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10
IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Paper • 2412.14963 • Published Dec 19, 2024 • 6
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 222

Research papers

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Multi-Token Attention

Paper • 2504.00927 • Published Apr 1 • 55
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

Running

1.21k

1.21k

Video Face Swap

👱

Video deep fake
Running on L40S

1.57k

1.57k

Expression Editor

🐨

Quickly edit the expression of a face
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3 • 51

Diffusion Models

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 32
ODIN: A Single Model for 2D and 3D Perception

Paper • 2401.02416 • Published Jan 4, 2024 • 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs