Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload
Browse files- .gitattributes +5 -0
- MODELFILE +25 -0
- Qwen3-4B-Q2_K/README.md +81 -0
- Qwen3-4B-Q3_K_M/README.md +81 -0
- Qwen3-4B-Q3_K_S/README.md +81 -0
- Qwen3-4B-Q4_K_M/README.md +81 -0
- Qwen3-4B-Q4_K_S/README.md +81 -0
- Qwen3-4B-Q5_K_M/README.md +81 -0
- Qwen3-4B-Q5_K_S/README.md +81 -0
- Qwen3-4B-Q6_K/README.md +81 -0
- Qwen3-4B-Q8_0/README.md +81 -0
- Qwen3-4B-f16:Q2_K.gguf +3 -0
- Qwen3-4B-f16:Q3_K_M.gguf +3 -0
- Qwen3-4B-f16:Q3_K_S.gguf +3 -0
- Qwen3-4B-f16:Q6_K.gguf +3 -0
- Qwen3-4B-f16:Q8_0.gguf +3 -0
- README.md +26 -49
- SHA256SUMS.txt +5 -1
.gitattributes
CHANGED
|
@@ -38,3 +38,8 @@ Qwen3-4B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
|
| 38 |
Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
Qwen3-4B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
Qwen3-4B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
Qwen3-4B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
Qwen3-4B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
Qwen3-4B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
MODELFILE
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MODELFILE for Qwen3-4B-GGUF
|
| 2 |
+
# Used by LM Studio, OpenWebUI, GPT4All, etc.
|
| 3 |
+
|
| 4 |
+
context_length: 32768
|
| 5 |
+
embedding: false
|
| 6 |
+
f16: cpu
|
| 7 |
+
|
| 8 |
+
# Chat template using ChatML (used by Qwen)
|
| 9 |
+
prompt_template: >-
|
| 10 |
+
<|im_start|>system
|
| 11 |
+
You are a helpful assistant.<|im_end|>
|
| 12 |
+
<|im_start|>user
|
| 13 |
+
{prompt}<|im_end|>
|
| 14 |
+
<|im_start|>assistant
|
| 15 |
+
|
| 16 |
+
# Stop sequences help end generation cleanly
|
| 17 |
+
stop: "<|im_end|>"
|
| 18 |
+
stop: "<|im_start|>"
|
| 19 |
+
|
| 20 |
+
# Default sampling
|
| 21 |
+
temperature: 0.6
|
| 22 |
+
top_p: 0.95
|
| 23 |
+
top_k: 20
|
| 24 |
+
min_p: 0.0
|
| 25 |
+
repeat_penalty: 1.1
|
Qwen3-4B-Q2_K/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q2_K
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q2_K** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 1.6G
|
| 22 |
+
- **Precision**: Q2_K
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Only on very weak hardware |
|
| 31 |
+
| **Speed** | 🚀 Fast |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Only on very weak hardware; poor reasoning. Avoid if possible. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q3_K_M/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q3_K_M
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_M** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 2.0G
|
| 22 |
+
- **Precision**: Q3_K_M
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Acceptable for basic chat on older CPUs. |
|
| 31 |
+
| **Speed** | 🚀 Fast |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Acceptable for basic chat on older CPUs. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q3_K_S/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q3_K_S
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_S** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 1.8G
|
| 22 |
+
- **Precision**: Q3_K_S
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Minimal viable for simple tasks. Avoid for reasoning. |
|
| 31 |
+
| **Speed** | 🚀 Fast |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Minimal viable for simple tasks. Avoid for reasoning. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q4_K_M/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q4_K_M
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_M** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 2.4G
|
| 22 |
+
- **Precision**: Q4_K_M
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Best speed/quality balance for most users. Ideal for laptops & general use. |
|
| 31 |
+
| **Speed** | 🚀 Fast |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Best speed/quality balance for most users. Ideal for laptops & general use. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q4_K_S/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q4_K_S
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_S** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 2.3G
|
| 22 |
+
- **Precision**: Q4_K_S
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Good for low-end devices |
|
| 31 |
+
| **Speed** | 🚀 Fast |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Good for low-end devices; decent performance. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q5_K_M/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q5_K_M
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_M** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 2.7G
|
| 22 |
+
- **Precision**: Q5_K_M
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
|
| 31 |
+
| **Speed** | 🐢 Medium |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q5_K_S/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q5_K_S
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_S** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 2.7G
|
| 22 |
+
- **Precision**: Q5_K_S
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Great for reasoning |
|
| 31 |
+
| **Speed** | 🐢 Medium |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Great for reasoning; slightly faster than Q5_K_M. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q6_K/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q6_K
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q6_K** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 3.1G
|
| 22 |
+
- **Precision**: Q6_K
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Excellent fidelity |
|
| 31 |
+
| **Speed** | 🐢 Medium |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-Q8_0/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- thinking-mode
|
| 10 |
+
base_model: Qwen/Qwen3-4B
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Q8_0
|
| 15 |
+
|
| 16 |
+
Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q8_0** level, derived from **f16** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: 4.0G
|
| 22 |
+
- **Precision**: Q8_0
|
| 23 |
+
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | Highest quality without FP16 |
|
| 31 |
+
| **Speed** | 🐢 Medium |
|
| 32 |
+
| **RAM Required** | |
|
| 33 |
+
| **Recommendation** | Highest quality without FP16; perfect for accuracy-critical tasks. |
|
| 34 |
+
|
| 35 |
+
## Prompt Template (ChatML)
|
| 36 |
+
|
| 37 |
+
This model uses the **ChatML** format used by Qwen:
|
| 38 |
+
|
| 39 |
+
```text
|
| 40 |
+
<|im_start|>system
|
| 41 |
+
You are a helpful assistant.<|im_end|>
|
| 42 |
+
<|im_start|>user
|
| 43 |
+
{prompt}<|im_end|>
|
| 44 |
+
<|im_start|>assistant
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 48 |
+
|
| 49 |
+
## Generation Parameters
|
| 50 |
+
|
| 51 |
+
Recommended defaults:
|
| 52 |
+
|
| 53 |
+
| Parameter | Value |
|
| 54 |
+
|---------|-------|
|
| 55 |
+
| Temperature | 0.6 |
|
| 56 |
+
| Top-P | 0.95 |
|
| 57 |
+
| Top-K | 20 |
|
| 58 |
+
| Min-P | 0.0 |
|
| 59 |
+
| Repeat Penalty | 1.1 |
|
| 60 |
+
|
| 61 |
+
Stop sequences: `<|im_end|>`, `<|im_start|>`
|
| 62 |
+
|
| 63 |
+
## Verification
|
| 64 |
+
|
| 65 |
+
Check integrity:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Usage
|
| 72 |
+
|
| 73 |
+
Compatible with:
|
| 74 |
+
- [LM Studio](https://lmstudio.ai)
|
| 75 |
+
- [OpenWebUI](https://openwebui.com)
|
| 76 |
+
- [GPT4All](https://gpt4all.io)
|
| 77 |
+
- Directly via llama.cpp
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-4B-f16:Q2_K.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5
|
| 3 |
+
size 1669499616
|
Qwen3-4B-f16:Q3_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef
|
| 3 |
+
size 2075618016
|
Qwen3-4B-f16:Q3_K_S.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913
|
| 3 |
+
size 1886997216
|
Qwen3-4B-f16:Q6_K.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db
|
| 3 |
+
size 3306261216
|
Qwen3-4B-f16:Q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4
|
| 3 |
+
size 4280405216
|
README.md
CHANGED
|
@@ -13,53 +13,32 @@ author: geoffmunn
|
|
| 13 |
|
| 14 |
# Qwen3-4B-GGUF
|
| 15 |
|
| 16 |
-
This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio).
|
| 17 |
-
|
| 18 |
-
## Model Details
|
| 19 |
-
|
| 20 |
-
- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
| 21 |
-
- **Conversion Tool**: [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
|
| 22 |
-
- **Architecture**: Causal Language Model
|
| 23 |
-
- **License**: apache-2.0 (see base model for details)
|
| 24 |
|
| 25 |
## Available Quantizations (from f16)
|
| 26 |
|
| 27 |
The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
|
| 28 |
|
| 29 |
-
| Level | Quality
|
| 30 |
-
|
| 31 |
-
|
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
>
|
| 49 |
-
> - Lower temp (~0.2–0.4) for coding or factual QA.
|
| 50 |
-
> - Higher temp (~0.7–0.9) for brainstorming or creative writing.
|
| 51 |
-
|
| 52 |
-
```bash
|
| 53 |
-
curl http://192.168.1.10:11434/api/generate -s -N -d '{
|
| 54 |
-
"model": "hf.co/geoffmunn/Qwen3-4B-GGUF:Q4_K_M",
|
| 55 |
-
"prompt": "A bat and a ball cost 1.10 together. The bat costs 1.00 more than the ball. How much does the ball cost?",
|
| 56 |
-
"temperature": 0.6,
|
| 57 |
-
"top_p": 0.95,
|
| 58 |
-
"top_k": 20,
|
| 59 |
-
"min_p": 0,
|
| 60 |
-
"stream": false
|
| 61 |
-
}' | jq
|
| 62 |
-
```
|
| 63 |
|
| 64 |
## Usage
|
| 65 |
|
|
@@ -67,19 +46,17 @@ Load this model using:
|
|
| 67 |
- [OpenWebUI](https://openwebui.com)
|
| 68 |
- [LM Studio](https://lmstudio.ai)
|
| 69 |
- [GPT4All](https://gpt4all.io)
|
| 70 |
-
- Or directly via \`llama.cpp
|
| 71 |
|
| 72 |
-
|
| 73 |
-
./main -m Qwen3-4B-f16-Q5_K_M.gguf -p "Explain quantum entanglement simply."
|
| 74 |
-
```
|
| 75 |
|
| 76 |
## Verification
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
sha256sum -c SHA256SUMS.txt
|
| 82 |
-
|
| 83 |
|
| 84 |
## Author
|
| 85 |
|
|
|
|
| 13 |
|
| 14 |
# Qwen3-4B-GGUF
|
| 15 |
|
| 16 |
+
This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio, GPT4All).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## Available Quantizations (from f16)
|
| 19 |
|
| 20 |
The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
|
| 21 |
|
| 22 |
+
| Level | Quality | Speed | Size Est. | Recommendation |
|
| 23 |
+
|----------|--------------|----------|-----------|----------------|
|
| 24 |
+
| Q2_K | Very Low | ⚡ Fastest | ~2.1 GB | Only on very weak hardware; poor reasoning. Avoid if possible. |
|
| 25 |
+
| Q3_K_S | Low | ⚡ Fast | ~2.5 GB | Minimal viable for simple tasks. Avoid for reasoning. |
|
| 26 |
+
| Q3_K_M | Low-Medium | ⚡ Fast | ~2.8 GB | Acceptable for basic chat on older CPUs. |
|
| 27 |
+
| Q4_K_S | Medium | 🚀 Fast | ~3.0 GB | Good for low-end devices; decent performance. |
|
| 28 |
+
| Q4_K_M | ✅ Balanced | 🚀 Fast | ~3.2 GB | Best speed/quality balance for most users. Ideal for laptops & general use. |
|
| 29 |
+
| Q5_K_S | High | 🐢 Medium | ~3.4 GB | Great for reasoning; slightly faster than Q5_K_M. |
|
| 30 |
+
| Q5_K_M | ✅✅ High | 🐢 Medium | ~3.6 GB | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
|
| 31 |
+
| Q6_K | 🔥 Near-FP16 | 🐌 Slow | ~4.2 GB | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
|
| 32 |
+
| Q8_0 | 🏆 Lossless* | 🐌 Slow | ~4.8 GB | Highest quality without FP16; perfect for accuracy-critical tasks. Recommended when full fidelity is needed. |
|
| 33 |
+
|
| 34 |
+
> 💡 **Recommendations by Use Case**
|
| 35 |
+
>
|
| 36 |
+
> - 💻 **Low-end CPU / Mac Mini / Old Laptop**: `Q4_K_M`
|
| 37 |
+
> - 🖥️ **Standard Laptop (M1/M2 Mac, i5/i7)**: `Q5_K_M` (best overall)
|
| 38 |
+
> - 🧠 **Reasoning, Coding, Math**: `Q6_K` or `Q8_0`
|
| 39 |
+
> - 🔍 **RAG, Retrieval, Precision Tasks**: `Q8_0`
|
| 40 |
+
> - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
|
| 41 |
+
> - 🛠️ **Development & Testing**: Always test across Q4_K_M → Q8_0 to validate robustness.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Usage
|
| 44 |
|
|
|
|
| 46 |
- [OpenWebUI](https://openwebui.com)
|
| 47 |
- [LM Studio](https://lmstudio.ai)
|
| 48 |
- [GPT4All](https://gpt4all.io)
|
| 49 |
+
- Or directly via \`llama.cpp\`
|
| 50 |
|
| 51 |
+
Each model includes its own `README.md` and `MODELFILE` for optimal configuration.
|
|
|
|
|
|
|
| 52 |
|
| 53 |
## Verification
|
| 54 |
|
| 55 |
+
Use \`SHA256SUMS.txt\` to verify file integrity:
|
| 56 |
|
| 57 |
+
\`\`\`bash
|
| 58 |
sha256sum -c SHA256SUMS.txt
|
| 59 |
+
\`\`\`
|
| 60 |
|
| 61 |
## Author
|
| 62 |
|
SHA256SUMS.txt
CHANGED
|
@@ -1,5 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
| 1 |
94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45 Qwen3-4B-f16:Q4_K_M.gguf
|
| 2 |
7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6 Qwen3-4B-f16:Q4_K_S.gguf
|
| 3 |
39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d Qwen3-4B-f16:Q5_K_M.gguf
|
| 4 |
0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7 Qwen3-4B-f16:Q5_K_S.gguf
|
| 5 |
-
|
|
|
|
|
|
| 1 |
+
5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5 Qwen3-4B-f16:Q2_K.gguf
|
| 2 |
+
6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef Qwen3-4B-f16:Q3_K_M.gguf
|
| 3 |
+
41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913 Qwen3-4B-f16:Q3_K_S.gguf
|
| 4 |
94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45 Qwen3-4B-f16:Q4_K_M.gguf
|
| 5 |
7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6 Qwen3-4B-f16:Q4_K_S.gguf
|
| 6 |
39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d Qwen3-4B-f16:Q5_K_M.gguf
|
| 7 |
0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7 Qwen3-4B-f16:Q5_K_S.gguf
|
| 8 |
+
9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db Qwen3-4B-f16:Q6_K.gguf
|
| 9 |
+
fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4 Qwen3-4B-f16:Q8_0.gguf
|