LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training Paper • 2510.14969 • Published 25 days ago • 5
Vibe Checker: Aligning Code Evaluation with Human Preference Paper • 2510.07315 • Published Oct 8 • 30
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Paper • 2502.02584 • Published Feb 4 • 17