Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
dvruette 's Collections
Scaling Behavior of Discrete Diffusion Language Models
Generalized Interpolating Discrete Diffusion
OpenWebText BPE

OpenWebText BPE

updated 8 days ago

BPE tokenizers with vocab sizes between 1k and 131k trained on OpenWebText, as well as the pre-tokenized dataset for each of them.

Upvote
-

  • dvruette/openwebtext-bpe-1k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-2k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-4k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-8k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-16k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-33k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-66k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-131k

    Updated Dec 17, 2025

  • dvruette/openwebtext-tokenized-1k

    Viewer • Updated Dec 19, 2025 • 8.01M • 5

  • dvruette/openwebtext-tokenized-2k

    Viewer • Updated Dec 19, 2025 • 8.01M • 4

  • dvruette/openwebtext-tokenized-4k

    Viewer • Updated Dec 19, 2025 • 8.01M • 3

  • dvruette/openwebtext-tokenized-8k

    Viewer • Updated Dec 19, 2025 • 8.01M • 3

  • dvruette/openwebtext-tokenized-16k

    Viewer • Updated Dec 19, 2025 • 8.01M • 5

  • dvruette/openwebtext-tokenized-33k

    Viewer • Updated Dec 19, 2025 • 8.01M • 19

  • dvruette/openwebtext-tokenized-66k

    Viewer • Updated Dec 19, 2025 • 8.01M • 8

  • dvruette/openwebtext-tokenized-131k

    Viewer • Updated Dec 19, 2025 • 8.01M • 3
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs