Update README.md
Browse files
README.md
CHANGED
|
@@ -5,8 +5,9 @@ datasets:
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Doge-tokenizer
|
| 8 |
-
Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus)
|
|
|
|
| 9 |
- FineWeb-Edu 70%
|
| 10 |
- Cosmopedia v2 20%
|
| 11 |
- Python-Edu 5%
|
| 12 |
-
- FineMath 5%
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Doge-tokenizer
|
| 8 |
+
Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus), and support reasoning fine-tuning like R1.
|
| 9 |
+
This tokenizer was trained on 2M samples from:
|
| 10 |
- FineWeb-Edu 70%
|
| 11 |
- Cosmopedia v2 20%
|
| 12 |
- Python-Edu 5%
|
| 13 |
+
- FineMath 5%
|