MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29 • 7
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks Paper • 2508.18672 • Published Aug 26 • 10
open-sci-ref releases Collection Open-sci-ref: reference baselines releases • 1 item • Updated Jun 23 • 1
Optimal Sparsity Code Collection Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks • 65 items • Updated Aug 21 • 1
Optimal Sparsity Math Collection Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks • 67 items • Updated Aug 19 • 1
open-sci-ref-0.01 Collection Research baseline models trained on various open reference datasets • 12 items • Updated Jul 23 • 4
SwallowCode Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated May 7 • 4
SwallowMath Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated May 7 • 4
LLM-jp-3 Pre-trained Models Collection Pre-trained models in the LLM-jp-3 model series • 10 items • Updated May 28 • 6
LLM-jp-3 Fine-tuned Models Collection Fine-tuned models in the LLM-jp-3 model series • 25 items • Updated May 28 • 6