MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 29 items • Updated 9 days ago • 72
Nemotron v3 Pre-Training Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 4 days ago • 9
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 20 items • Updated about 3 hours ago • 72
view article Article Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time Feb 18, 2025 • 35
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 273
DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 4 days ago • 265
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated 9 days ago • 164
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 309
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 37 items • Updated 9 days ago • 376
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27, 2024 • 96