LLM Evaluation - a EdenYav Collection

EdenYav 's Collections

LLM in Cybersecurity

LLM Evaluation

updated Aug 30

Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

Paper • 2508.03644 • Published Aug 5 • 25
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 139
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28 • 63