Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published Apr 7 • 11
Medical QA Datasets Collection A collection of medical question answering (QA) datasets • 23 items • Updated Feb 22 • 45
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 252