In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7 • 101
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published Jul 20 • 84
AI2 Safety Toolkit Collection Safety data, moderation tools and safe LLMs. • 6 items • Updated Apr 30 • 8
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published Jun 1 • 20
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24 • 36
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 141
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning Paper • 2504.13941 • Published Apr 15 • 11