Update README.md
Browse files
README.md
CHANGED
|
@@ -26,8 +26,9 @@ The AquilaChat2-34B model is close to or exceeds the level of GPT3.5 in the subj
|
|
| 26 |
|
| 27 |
The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.
|
| 28 |
|
|
|
|
| 29 |
<p>
|
| 30 |
-
|
| 31 |
|
| 32 |
Upon thorough investigation and analysis, it was found that the data leakage occurred in the mathematical dataset A (over 2 million samples), recommended by a team we have collaborated with multiple times. This dataset includes the untreated GSM8K test set (1319 samples). The team only performed routine de-duplication and quality checks but did not conduct an extra filtering check for the presence of the GSM8K test data, resulting in this oversight.
|
| 33 |
|
|
|
|
| 26 |
|
| 27 |
The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.
|
| 28 |
|
| 29 |
+
### Note
|
| 30 |
<p>
|
| 31 |
+
We have discovered a data leakage problem with the GSM8K test data in the pre-training task dataset. Therefore, the evaluation results of GSM8K have been removed from the evaluation results.
|
| 32 |
|
| 33 |
Upon thorough investigation and analysis, it was found that the data leakage occurred in the mathematical dataset A (over 2 million samples), recommended by a team we have collaborated with multiple times. This dataset includes the untreated GSM8K test set (1319 samples). The team only performed routine de-duplication and quality checks but did not conduct an extra filtering check for the presence of the GSM8K test data, resulting in this oversight.
|
| 34 |
|