view article Article Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick Oct 24, 2024 • 14
Rope to Nope and Back Again: A New Hybrid Attention Strategy Paper • 2501.18795 • Published Jan 30 • 12
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 • 264
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 717
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published Jun 6 • 30
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods Paper • 2505.17870 • Published May 23 • 5
Cache Me if You Can: Accelerating Diffusion Models through Block Caching Paper • 2312.03209 • Published Dec 6, 2023 • 22
RealHarm: A Collection of Real-World Language Model Application Failures Paper • 2504.10277 • Published Apr 14 • 10
Min P Sampling: Balancing Creativity and Coherence at High Temperature Paper • 2407.01082 • Published Jul 1, 2024 • 1
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 470
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Paper • 2502.18080 • Published Feb 25 • 2