-
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
Wavelet Scattering Transform for Bioacustics: Application to Watkins Marine Mammal Sound Database
Paper • 2402.17775 • Published -
Wavelet Diffusion Neural Operator
Paper • 2412.04833 • Published -
Photorealistic Style Transfer via Wavelet Transforms
Paper • 1903.09760 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2411.17459
-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 12 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 6 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 23
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
Wavelet Scattering Transform for Bioacustics: Application to Watkins Marine Mammal Sound Database
Paper • 2402.17775 • Published -
Wavelet Diffusion Neural Operator
Paper • 2412.04833 • Published -
Photorealistic Style Transfer via Wavelet Transforms
Paper • 1903.09760 • Published
-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 12 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 6 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 23
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13