Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30 • 16
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30 • 43