Alibaba-NLP/Tongyi-DeepResearch-30B-A3B Text Generation • 31B • Updated 30 days ago • 12.7k • 752
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 55
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 138
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated Aug 12 • 14.7k • 222
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 63