Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alanox 's Collections
LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Upvote
-

  • Running on CPU Upgrade
    240

    MMLU-Pro Leaderboard

    🥇
    240

    More advanced and challenging multi-task evaluation


  • Running on CPU Upgrade
    580

    GAIA Leaderboard

    🦾
    580

    Submit and evaluate models on GAIA leaderboard

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs