Are We on the Right Way for Assessing Document Retrieval-Augmented Generation? Paper • 2508.03644 • Published Aug 5 • 25
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 139
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63