Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving Paper • 2512.17077 • Published Dec 18, 2025
PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks Paper • 2501.09367 • Published Jan 16, 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs Paper • 2502.13965 • Published Feb 19, 2025 • 19
Ascendra: Dynamic Request Prioritization for Efficient LLM Serving Paper • 2504.20828 • Published Apr 29, 2025 • 2
Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling Paper • 2508.03611 • Published Aug 5, 2025 • 1
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage Paper • 2504.19867 • Published Apr 28, 2025