Add evaluation results for GPQA, HLE

#3
by SaylorTwift HF Staff - opened

Evaluation Results

This PR adds evaluation results extracted from the Model Card.

Benchmarks:

  • GPQA: 85.2
  • HLE: 19.4

Files created:

  • .eval_results/gpqa.yaml
  • .eval_results/hle.yaml
yuanhe134 changed pull request status to merged

Sign up or log in to comment