LLM Evaluation Benchmarks - a Alanox Collection

Alanox 's Collections

LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Running on CPU Upgrade

240

MMLU-Pro Leaderboard

🥇

240

More advanced and challenging multi-task evaluation
Running on CPU Upgrade

580

GAIA Leaderboard

🦾

580

Submit and evaluate models on GAIA leaderboard