BPE tokenizers with vocab sizes between 1k and 131k trained on OpenWebText, as well as the pre-tokenized dataset for each of them.