Fix metrics table (#8)

- Fix metrics table (4880de929c7a66c12a3967d701629b39df41e7b0)

Co-authored-by: Adarsh <tidealwari@users.noreply.huggingface.co>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -73,10 +73,10 @@ for i in output:
 | **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
 |-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
-| **qwen2.5-coder-14b-qiskit** | 25.17 | 49.01 | 91.46 | 4.21 | 53.90 | 97.00 | 77.60 | 49.64 | 65.18 | 37.82 |
-| mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | 20.69 | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | 42.84 |
-| granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | 59.75 | 39.05 |
-| granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | 60.79 | 66.79 | 40.51 |
 *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*

 | **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
 |-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
+| **qwen2.5-coder-14b-qiskit** | **25.17** | **49.01** | **91.46** | 4.21 | **53.90** | **97.00** | **77.60** | 49.64 | 65.18 | 37.82 |
+| mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | **20.69** | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | **42.84** |
+| granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | **59.75** | 39.05 |
+| granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | **60.79** | 66.79 | 40.51 |
 *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*