ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback Paper • 2601.10156 • Published 4 days ago • 21
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future Paper • 2508.06026 • Published Aug 8, 2025 • 15
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published Dec 19, 2024 • 19