Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2212.09748

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 15
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 18

Sora Reference Papers

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 5
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 74
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22, 2024 • 30
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5, 2024 • 67
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 50

ICCV 2023 Demos

Demos for ICCV 2023 papers

Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 57
Running on L4

1.16k

ControlNet V1.1

📉

1.16k

Transform images using various artistic effects
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Paper • 2303.13439 • Published Mar 23, 2023 • 5
Runtime error

379

Text2Video-Zero

🚀

379

Diffusion Model

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Paper • 2309.03895 • Published Sep 7, 2023 • 14
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Paper • 2309.16650 • Published Sep 28, 2023 • 10
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling

Paper • 2310.15169 • Published Oct 23, 2023 • 10

Sora参考论文

OpenAI "Video generation models as world simulators"技术报告后面的参考论文，总共32篇。OpenAI的ImageGPT和Dalle3这两篇缺失，链接已补充到note中。

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 5
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 11
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 10
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 23
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 11
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 28

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 6
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Vision Transformers

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 15
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 83
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Paper • 2309.16534 • Published Sep 28, 2023 • 16

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 15
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 18

Sora参考论文

OpenAI "Video generation models as world simulators"技术报告后面的参考论文，总共32篇。OpenAI的ImageGPT和Dalle3这两篇缺失，链接已补充到note中。

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 5
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

Sora Reference Papers

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 5
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 11
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 74
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22, 2024 • 30
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5, 2024 • 67
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 50

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 10
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 23
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 11
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 28

ICCV 2023 Demos

Demos for ICCV 2023 papers

Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 57
Running on L4

1.16k

ControlNet V1.1

📉

1.16k

Transform images using various artistic effects
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Paper • 2303.13439 • Published Mar 23, 2023 • 5
Runtime error

379

Text2Video-Zero

🚀

379

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 6
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Diffusion Model

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Paper • 2309.03895 • Published Sep 7, 2023 • 14
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Paper • 2309.16650 • Published Sep 28, 2023 • 10
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling

Paper • 2310.15169 • Published Oct 23, 2023 • 10

Vision Transformers

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 15
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 83
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Paper • 2309.16534 • Published Sep 28, 2023 • 16

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs