Version of the gpt-oss tokenizer (o200k_harmony) filtered to exclude tokens with characters that are not in the Latin, Cyrillic, Greek, or Georgian scripts (or unicode Common/Unknown) using https://github.com/spyysalo/tokenizer-filter/ as follows:

python3 filter_by_script.py openai/gpt-oss-120b Latin Cyrillic Greek Georgian --save-dir harmony-latin-plus
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support