Bridging Offline and Online Reinforcement Learning for LLMs Paper • 2506.21495 • Published Jun 26 • 3
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks Paper • 2507.23751 • Published Jul 31 • 4
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs Paper • 2508.13141 • Published Aug 18
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published 15 days ago • 13
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24
TOOLVERIFIER: Generalization to New Tools via Self-Verification Paper • 2402.14158 • Published Feb 21, 2024
Adaptive Decoding via Latent Preference Optimization Paper • 2411.09661 • Published Nov 14, 2024 • 10