R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published Mar 4 • 18
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 30