RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 127
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 46
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 83
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 123
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 73
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 224
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
OpenCUA: Open Foundations for Computer-Use Agents Collection This is the official versions of OpenCUA models and AgentNet datasets. Website: https://opencua.xlang.ai/ • 8 items • Updated 29 days ago • 20
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 40
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 42
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows Paper • 2411.07763 • Published Nov 12, 2024 • 2
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models Paper • 2310.15147 • Published Oct 23, 2023 • 2
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models Paper • 2402.12851 • Published Feb 20, 2024 • 2