Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 46
Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation Paper • 2602.02007 • Published Feb 2 • 16
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference Paper • 2510.09665 • Published Oct 8, 2025 • 4
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 315
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 137
Running 3.74k The Ultra-Scale Playbook 🌌 3.74k The ultimate guide to training LLM on large GPU Clusters