Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 32
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published about 1 month ago • 62
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published 29 days ago • 3
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published about 1 month ago • 15
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published 28 days ago • 3
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 22 days ago • 114
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models Paper • 2601.00423 • Published 21 days ago • 8
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners Paper • 2601.02996 • Published 16 days ago • 4
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning Paper • 2512.23412 • Published 24 days ago • 37
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published 27 days ago • 34
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published about 1 month ago • 15
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published Dec 18, 2025 • 10
Are We on the Right Way to Assessing LLM-as-a-Judge? Paper • 2512.16041 • Published Dec 17, 2025 • 32
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published Dec 19, 2025 • 49
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published Dec 18, 2025 • 115
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image Paper • 2512.16899 • Published Dec 18, 2025 • 12