Executable Knowledge Graphs for Replicating AI Research Paper • 2510.17795 • Published 22 days ago • 13
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published 21 days ago • 108
When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality Evaluation Paper • 2510.07238 • Published Oct 8 • 14
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29 • 28
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30 • 34
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 9
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Paper • 2506.19794 • Published Jun 24 • 8
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 53
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Paper • 2506.10960 • Published Jun 12 • 12
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12 • 18
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Paper • 2505.21374 • Published May 27 • 27
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging Paper • 2503.21088 • Published Mar 27 • 8