SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios Paper • 2512.18470 • Published 8 days ago • 8
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios Paper • 2512.18470 • Published 8 days ago • 8
CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases Paper • 2510.24428 • Published Oct 28 • 2
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Paper • 2410.01999 • Published Oct 2, 2024 • 10
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Paper • 2410.01999 • Published Oct 2, 2024 • 10 • 4
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published Sep 9, 2024 • 11
Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs Paper • 2408.02816 • Published Aug 5, 2024 • 5
Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs Paper • 2408.02816 • Published Aug 5, 2024 • 5
Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs Paper • 2408.02816 • Published Aug 5, 2024 • 5 • 2