brittlewis12 commited on
Commit
871b8ae
·
verified ·
1 Parent(s): 82eb09b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
3
+ pipeline_tag: text-generation
4
+ inference: true
5
+ language:
6
+ - en
7
+ license: mit
8
+ model_creator: deepseek-ai
9
+ model_name: DeepSeek-R1-0528-Qwen3-8B
10
+ model_type: qwen3
11
+ quantized_by: brittlewis12
12
+ tags:
13
+ - reasoning
14
+ - deepseek
15
+ - qwen3
16
+ ---
17
+
18
+ # DeepSeek R1 0528 Qwen3 8B GGUF
19
+
20
+ **Original model**: [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B)
21
+
22
+ **Model creator**: [DeepSeek AI](https://huggingface.co/deepseek-ai)
23
+
24
+ > We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial development focused on small-scale models.
25
+
26
+ This repo contains GGUF format model files for DeepSeek AI's _DeepSeek R1 0528 Qwen3 8B_.
27
+
28
+ ### What is GGUF?
29
+
30
+ GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023.
31
+
32
+ Converted with llama.cpp build b5536 (revision [2b13162](https://github.com/ggml-org/llama.cpp/commits/2b131621e60d8ec2cc961201beb6773ab37b6b69)), using [autogguf-rs](https://github.com/brittlewis12/autogguf-rs).
33
+
34
+ ### Prompt template: [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34)
35
+
36
+ ```
37
+ {{system_message}}
38
+
39
+ <|User|>{{prompt}}<|Assistant|>
40
+
41
+ ```
42
+
43
+ ### Notes from DeepSeek on Running Locally
44
+
45
+ > Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes:
46
+ >
47
+ > - System prompt is supported now.
48
+ > - It is not required to add `<think>\n` at the beginning of the output to force the model into thinking pattern.
49
+ >
50
+ > The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528.
51
+
52
+
53
+ ---
54
+
55
+ ## Download & run with [cnvrs](https://twitter.com/cnvrsai) on iPhone, iPad, and Mac!
56
+
57
+ ![cnvrs.ai](https://pbs.twimg.com/profile_images/1744049151241797632/0mIP-P9e_400x400.jpg)
58
+
59
+ [cnvrs](https://testflight.apple.com/join/sFWReS7K) is the best app for private, local AI on your device:
60
+ - create & save **Characters** with custom system prompts & temperature settings
61
+ - download and experiment with any **GGUF model** you can [find on HuggingFace](https://huggingface.co/models?library=gguf)!
62
+ * or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more!
63
+ - make it your own with custom **Theme colors**
64
+ - powered by Metal ⚡️ & [Llama.cpp](https://github.com/ggml-org/llama.cpp), with **haptics** during response streaming!
65
+ - **try it out** yourself today, on [Testflight](https://testflight.apple.com/join/sFWReS7K)!
66
+ * if you **already have the app**, download DeepSeek R1 0528 Qwen3 8B now!
67
+ * <cnvrsai:///models/search/hf?id=brittlewis12/DeepSeek-R1-0528-Qwen3-8B-GGUF>
68
+ - follow [cnvrs on twitter](https://twitter.com/cnvrsai) to stay up to date
69
+
70
+ ---
71
+
72
+ ## Original Model Evaluation
73
+
74
+ > We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.
75
+
76
+ | | AIME 24 | AIME 25 | HMMT Feb 25 | GPQA Diamond | LiveCodeBench (2408-2505) |
77
+ |--------------------------------|---------|---------|-------------|--------------|---------------------------|
78
+ | Qwen3-235B-A22B | 85.7 | 81.5 | 62.5 | 71.1 | 66.5 |
79
+ | Qwen3-32B | 81.4 | 72.9 | - | 68.4 | - |
80
+ | Qwen3-8B | 76.0 | 67.3 | - | 62.0 | - |
81
+ | Phi-4-Reasoning-Plus-14B | 81.3 | 78.0 | 53.6 | 69.3 | - |
82
+ | Gemini-2.5-Flash-Thinking-0520 | 82.3 | 72.0 | 64.2 | 82.8 | 62.3 |
83
+ | o3-mini (medium) | 79.6 | 76.7 | 53.3 | 76.8 | 65.9 |
84
+ | **DeepSeek-R1-0528-Qwen3-8B** | **86.0** | **76.3** | **61.5** | **61.1** | **60.5** |
85
+
86
+ ---
87
+
88
+ ## DeepSeek R1 0528 Qwen3 8B in cnvrs on iOS
89
+
90
+ ![deepseek-r1-qwen3-8b in cnvrs pt1](https://cdn-uploads.huggingface.co/production/uploads/63b64d7a889aa6707f155cdb/nsXnOaK6Sb-0PGvdY8ayy.png)
91
+ ![deepseek-r1-qwen3-8b in cnvrs pt2](https://cdn-uploads.huggingface.co/production/uploads/63b64d7a889aa6707f155cdb/4AnhMFL41EuIwhKuVaCGi.png)
92
+
93
+ ---