ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 141
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix 15 days ago • 37
Ferret Collection A framework for training LLM agents via RL with advanced search capability: https://github.com/Tree-Shu-Zhao/ferret • 7 items • Updated 27 days ago • 1
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 50