--- base_model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B pipeline_tag: text-generation inference: true language: - en license: mit model_creator: deepseek-ai model_name: DeepSeek-R1-0528-Qwen3-8B model_type: qwen3 quantized_by: brittlewis12 tags: - reasoning - deepseek - qwen3 --- # DeepSeek R1 0528 Qwen3 8B GGUF **Original model**: [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) **Model creator**: [DeepSeek AI](https://huggingface.co/deepseek-ai) > We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial development focused on small-scale models. This repo contains GGUF format model files for DeepSeek AI's _DeepSeek R1 0528 Qwen3 8B_. ### What is GGUF? GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. Converted with llama.cpp build b5536 (revision [2b13162](https://github.com/ggml-org/llama.cpp/commits/2b131621e60d8ec2cc961201beb6773ab37b6b69)), using [autogguf-rs](https://github.com/brittlewis12/autogguf-rs). ### Prompt template: [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34) ``` {{system_message}} <|User|>{{prompt}}<|Assistant|> ``` ### Notes from DeepSeek on Running Locally > Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes: > > - System prompt is supported now. > - It is not required to add `\n` at the beginning of the output to force the model into thinking pattern. > > The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. --- ## Download & run with [cnvrs](https://twitter.com/cnvrsai) on iPhone, iPad, and Mac! ![cnvrs.ai](https://pbs.twimg.com/profile_images/1744049151241797632/0mIP-P9e_400x400.jpg) [cnvrs](https://testflight.apple.com/join/sFWReS7K) is the best app for private, local AI on your device: - create & save **Characters** with custom system prompts & temperature settings - download and experiment with any **GGUF model** you can [find on HuggingFace](https://huggingface.co/models?library=gguf)! * or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more! - make it your own with custom **Theme colors** - powered by Metal ⚡️ & [Llama.cpp](https://github.com/ggml-org/llama.cpp), with **haptics** during response streaming! - **try it out** yourself today, on [Testflight](https://testflight.apple.com/join/sFWReS7K)! * if you **already have the app**, download DeepSeek R1 0528 Qwen3 8B now! * - follow [cnvrs on twitter](https://twitter.com/cnvrsai) to stay up to date --- ## Original Model Evaluation > We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. | | AIME 24 | AIME 25 | HMMT Feb 25 | GPQA Diamond | LiveCodeBench (2408-2505) | |--------------------------------|---------|---------|-------------|--------------|---------------------------| | Qwen3-235B-A22B | 85.7 | 81.5 | 62.5 | 71.1 | 66.5 | | Qwen3-32B | 81.4 | 72.9 | - | 68.4 | - | | Qwen3-8B | 76.0 | 67.3 | - | 62.0 | - | | Phi-4-Reasoning-Plus-14B | 81.3 | 78.0 | 53.6 | 69.3 | - | | Gemini-2.5-Flash-Thinking-0520 | 82.3 | 72.0 | 64.2 | 82.8 | 62.3 | | o3-mini (medium) | 79.6 | 76.7 | 53.3 | 76.8 | 65.9 | | **DeepSeek-R1-0528-Qwen3-8B** | **86.0** | **76.3** | **61.5** | **61.1** | **60.5** | --- ## DeepSeek R1 0528 Qwen3 8B in cnvrs on iOS ![deepseek-r1-qwen3-8b in cnvrs pt1](https://cdn-uploads.huggingface.co/production/uploads/63b64d7a889aa6707f155cdb/nsXnOaK6Sb-0PGvdY8ayy.png) ![deepseek-r1-qwen3-8b in cnvrs pt2](https://cdn-uploads.huggingface.co/production/uploads/63b64d7a889aa6707f155cdb/4AnhMFL41EuIwhKuVaCGi.png) ---