Fix metrics table (#8)
Browse files- Fix metrics table (4880de929c7a66c12a3967d701629b39df41e7b0)
Co-authored-by: Adarsh <tidealwari@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -73,10 +73,10 @@ for i in output:
|
|
| 73 |
|
| 74 |
| **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
|
| 75 |
|-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
|
| 76 |
-
| **qwen2.5-coder-14b-qiskit** | 25.17 | 49.01 | 91.46 | 4.21 | 53.90 | 97.00 | 77.60 | 49.64 | 65.18 | 37.82 |
|
| 77 |
-
| mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | 20.69 | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | 42.84 |
|
| 78 |
-
| granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | 59.75 | 39.05 |
|
| 79 |
-
| granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | 60.79 | 66.79 | 40.51 |
|
| 80 |
|
| 81 |
*Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
|
| 82 |
|
|
|
|
| 73 |
|
| 74 |
| **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
|
| 75 |
|-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
|
| 76 |
+
| **qwen2.5-coder-14b-qiskit** | **25.17** | **49.01** | **91.46** | 4.21 | **53.90** | **97.00** | **77.60** | 49.64 | 65.18 | 37.82 |
|
| 77 |
+
| mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | **20.69** | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | **42.84** |
|
| 78 |
+
| granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | **59.75** | 39.05 |
|
| 79 |
+
| granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | **60.79** | 66.79 | 40.51 |
|
| 80 |
|
| 81 |
*Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
|
| 82 |
|