Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.01945

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Video Generation

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9 • 24
StreamDiT: Real-Time Streaming Text-to-Video Generation

Paper • 2507.03745 • Published Jul 4 • 31
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78
TokensGen: Harnessing Condensed Tokens for Long Video Generation

Paper • 2507.15728 • Published Jul 21 • 7

yandex/stable-diffusion-3.5-medium-alchemist

Text-to-Image • Updated May 16 • 34 • 6
Ovis-U1 Technical Report

Paper • 2506.23044 • Published Jun 29 • 62
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2 • 19
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Paper • 2411.05738 • Published Nov 8, 2024 • 15
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents

Paper • 2410.22476 • Published Oct 29, 2024 • 27
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25

Things I don't understand yet...

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78
How to Train Your LLM Web Agent: A Statistical Diagnosis

Paper • 2507.04103 • Published Jul 5 • 50

(Urban) World Model

SynCity: Training-Free Generation of 3D Worlds

Paper • 2503.16420 • Published Mar 20 • 27
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities

Paper • 2501.08983 • Published Jan 15 • 22
WorldGrow: Generating Infinite 3D World

Paper • 2510.21682 • Published about 1 month ago • 41
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18, 2024 • 18

kudzueye/boreal-hl-v1

Text-to-Video • Updated Feb 10 • 123
ByteDance/AnimateDiff-Lightning

Text-to-Video • Updated Jan 6 • 56.6k • 970
cagliostrolab/animagine-xl-4.0

Text-to-Image • Updated Feb 13 • 234k • 357
Lightricks/LTX-Video

Image-to-Video • Updated Jul 16 • 263k • • 2.06k

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 20 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Things I don't understand yet...

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78
How to Train Your LLM Web Agent: A Statistical Diagnosis

Paper • 2507.04103 • Published Jul 5 • 50

Video Generation

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9 • 24
StreamDiT: Real-Time Streaming Text-to-Video Generation

Paper • 2507.03745 • Published Jul 4 • 31
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78
TokensGen: Harnessing Condensed Tokens for Long Video Generation

Paper • 2507.15728 • Published Jul 21 • 7

(Urban) World Model

SynCity: Training-Free Generation of 3D Worlds

Paper • 2503.16420 • Published Mar 20 • 27
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities

Paper • 2501.08983 • Published Jan 15 • 22
WorldGrow: Generating Infinite 3D World

Paper • 2510.21682 • Published about 1 month ago • 41
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18, 2024 • 18

yandex/stable-diffusion-3.5-medium-alchemist

Text-to-Image • Updated May 16 • 34 • 6
Ovis-U1 Technical Report

Paper • 2506.23044 • Published Jun 29 • 62
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2 • 19
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 78

kudzueye/boreal-hl-v1

Text-to-Video • Updated Feb 10 • 123
ByteDance/AnimateDiff-Lightning

Text-to-Video • Updated Jan 6 • 56.6k • 970
cagliostrolab/animagine-xl-4.0

Text-to-Image • Updated Feb 13 • 234k • 357
Lightricks/LTX-Video

Image-to-Video • Updated Jul 16 • 263k • • 2.06k

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Paper • 2411.05738 • Published Nov 8, 2024 • 15
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents

Paper • 2410.22476 • Published Oct 29, 2024 • 27
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 20 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs