MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10 • 15
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10 • 15
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 71
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper • 2506.03968 • Published Jun 4 • 15
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper • 2506.03968 • Published Jun 4 • 15 • 2
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 71
SynthQuestions Collection Data and models for the paper From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding • 4 items • Updated Jun 11