Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3 • 96
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published Oct 8 • 48
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning Paper • 2510.14211 • Published Oct 16 • 7
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 30 days ago • 111
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published about 1 month ago • 109
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 66
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published 23 days ago • 20
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 21 days ago • 113
Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published Oct 17 • 39
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 21 days ago • 108
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published May 16 • 48