paper - a BrainR Collection

BrainR 's Collections

paper

updated 2 days ago

LAPS: A Length-Aware-Prefill LLM Serving System

Paper • 2601.11589 • Published Jan 4 • 1
Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving

Paper • 2512.17077 • Published Dec 18, 2025
PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks

Paper • 2501.09367 • Published Jan 16, 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Paper • 2502.13965 • Published Feb 19, 2025 • 19
Ascendra: Dynamic Request Prioritization for Efficient LLM Serving

Paper • 2504.20828 • Published Apr 29, 2025 • 2
Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling

Paper • 2508.03611 • Published Aug 5, 2025 • 1
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage

Paper • 2504.19867 • Published Apr 28, 2025