Text Generation
Transformers
Safetensors
granite
code
qiskit
conversational
cbjuan tidealwari commited on
Commit
6d638fb
·
verified ·
1 Parent(s): fee0955

Fix metrics table (#8)

Browse files

- Fix metrics table (4880de929c7a66c12a3967d701629b39df41e7b0)


Co-authored-by: Adarsh <tidealwari@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -73,10 +73,10 @@ for i in output:
73
 
74
  | **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
75
  |-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
76
- | **qwen2.5-coder-14b-qiskit** | 25.17 | 49.01 | 91.46 | 4.21 | 53.90 | 97.00 | 77.60 | 49.64 | 65.18 | 37.82 |
77
- | mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | 20.69 | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | 42.84 |
78
- | granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | 59.75 | 39.05 |
79
- | granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | 60.79 | 66.79 | 40.51 |
80
 
81
  *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
82
 
 
73
 
74
  | **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
75
  |-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
76
+ | **qwen2.5-coder-14b-qiskit** | **25.17** | **49.01** | **91.46** | 4.21 | **53.90** | **97.00** | **77.60** | 49.64 | 65.18 | 37.82 |
77
+ | mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | **20.69** | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | **42.84** |
78
+ | granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | **59.75** | 39.05 |
79
+ | granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | **60.79** | 66.79 | 40.51 |
80
 
81
  *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
82