BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 41
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 41
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents Paper • 2504.13128 • Published Apr 17 • 7