DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published 10 days ago • 38
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 123
Running 3.72k The Ultra-Scale Playbook 🌌 3.72k The ultimate guide to training LLM on large GPU Clusters
Fast Video Generation with Sliding Tile Attention Paper • 2502.04507 • Published Feb 6, 2025 • 51
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 36
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23, 2024 • 36
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23, 2024 • 36