FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning Paper • 2510.22543 • Published Oct 26, 2025 • 13
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24, 2024 • 42