Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
BrainR 's Collections
paper

paper

updated 2 days ago
Upvote
-

  • LAPS: A Length-Aware-Prefill LLM Serving System

    Paper • 2601.11589 • Published Jan 4 • 1

  • Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving

    Paper • 2512.17077 • Published Dec 18, 2025

  • PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks

    Paper • 2501.09367 • Published Jan 16, 2025

  • Autellix: An Efficient Serving Engine for LLM Agents as General Programs

    Paper • 2502.13965 • Published Feb 19, 2025 • 19

  • Ascendra: Dynamic Request Prioritization for Efficient LLM Serving

    Paper • 2504.20828 • Published Apr 29, 2025 • 2

  • Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling

    Paper • 2508.03611 • Published Aug 5, 2025 • 1

  • semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage

    Paper • 2504.19867 • Published Apr 28, 2025
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs