-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
Collections
Discover the best community collections!
Collections including paper arXiv:2212.09748
-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Generating 3D-Consistent Videos from Unposed Internet Photos
Paper • 2411.13549 • Published -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 57 -
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Paper • 2412.12093 • Published
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 24
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 146 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 4 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 149
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 14
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.59M • • 11.8k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.64M • • 1.24k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 6.26k • 502
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 15 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 18 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 21 -
Generating 3D-Consistent Videos from Unposed Internet Photos
Paper • 2411.13549 • Published -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 57 -
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Paper • 2412.12093 • Published
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.59M • • 11.8k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.64M • • 1.24k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 6.26k • 502
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 24
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 146 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 4 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 149
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 15 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 18 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 14
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117