Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13, 2025

Running on CPU Upgrade

241

MMLU-Pro Leaderboard

🥇

241

More advanced and challenging multi-task evaluation
Running

59

Stick To Your Role! Leaderboard

🎭

59

Benchmarking LLMs on the stability of simulated populations
Running

53

ZeroEval Leaderboard

📊

53

Embed ZeroEval for evaluation
Runtime error

26

Decentralized Arena Leaderboard

🥇

26

View and compare LLM evaluations across various domains
Runtime error

Featured

433

Open Medical-LLM Leaderboard

🥇

433

Explore and submit models for benchmarking
Running

352

GPU Poor LLM Arena

🏆

352

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

Featured

130

Open VLM Video Leaderboard

🌎

130

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.9k

Open LLM Leaderboard

🏆

13.9k

Track, rank and evaluate open LLMs and chatbots
Running

463

TTS Spaces Arena

🤗

463

Blind vote on HF TTS models!