view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 263
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 41
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Mar 17 • 348
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10 • 146
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... Jan 20 • 75
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 99
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 32