A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces Paper • 2602.03442 • Published 4 days ago • 18
Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles Paper • 2602.01590 • Published 5 days ago • 32
WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora Paper • 2602.02053 • Published 5 days ago • 40
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published 5 days ago • 44
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts Paper • 2305.14688 • Published May 24, 2023
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions Paper • 2401.00690 • Published Jan 1, 2024 • 1
Building Chinese Biomedical Language Models via Multi-Level Text Discrimination Paper • 2110.07244 • Published Oct 14, 2021
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning Paper • 2311.08182 • Published Nov 14, 2023
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability Paper • 2505.24147 • Published May 30, 2025
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper • 2506.03968 • Published Jun 4, 2025 • 15
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 74
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking Paper • 2505.20023 • Published May 26, 2025
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 16
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 16
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper • 2506.03968 • Published Jun 4, 2025 • 15
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 74
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study Paper • 2411.02462 • Published Nov 4, 2024 • 9