SmallDoge
/

SmolDoge-tokenizer

Model card Files Files and versions

JingzeShi commited on Feb 2, 2025

Commit

42631d6

·

verified ·

1 Parent(s): 6f8c0b4

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -5,8 +5,9 @@ datasets:
 ---
 # Doge-tokenizer
- Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus). This tokenizer was trained on 2M samples from:
  - FineWeb-Edu 70%
  - Cosmopedia v2 20%
  - Python-Edu  5%
- - FineMath 5%

 ---
 # Doge-tokenizer
+ Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus), and support reasoning fine-tuning like R1.
+This tokenizer was trained on 2M samples from:
  - FineWeb-Edu 70%
  - Cosmopedia v2 20%
  - Python-Edu  5%
+ - FineMath 5%