view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 24 days ago • 110
Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 19 items • Updated 17 days ago • 18
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 28
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper • 2509.13523 • Published Sep 16, 2025 • 7
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper • 2509.13523 • Published Sep 16, 2025 • 7 • 2
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper • 2509.13523 • Published Sep 16, 2025 • 7
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 18 days ago • 91
SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation Paper • 2504.14396 • Published Apr 19, 2025 • 27
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 Feb 12, 2025 • 81