Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.21136

inference acceleration

SageAttention2++: A More Efficient Implementation of SageAttention2

Paper • 2505.21136 • Published May 27 • 45
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 55
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25 • 58
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 49

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 15
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157

about Transformer

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22, 2024 • 31
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 93
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171

inference acceleration

SageAttention2++: A More Efficient Implementation of SageAttention2

Paper • 2505.21136 • Published May 27 • 45
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 15
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 55
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25 • 58
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 49

about Transformer

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22, 2024 • 31
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 93
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs