Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
Learning to Reason as Action Abstractions with Scalable Mid-Training RL Paper • 2509.25810 • Published Sep 30 • 5
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 185
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 55
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 48
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Paper • 2510.18855 • Published 28 days ago • 68
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation Paper • 2509.26497 • Published Sep 30
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published 24 days ago • 81
Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models Paper • 2509.00309 • Published Aug 30
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation Paper • 2509.18521 • Published Sep 23