geoffmunn commited on
Commit
c1b1cc8
·
verified ·
1 Parent(s): 509ef55

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload

Browse files
.gitattributes CHANGED
@@ -38,3 +38,8 @@ Qwen3-4B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
38
  Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
  Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
40
  Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
38
  Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
  Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
40
  Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Qwen3-4B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Qwen3-4B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Qwen3-4B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Qwen3-4B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
45
+ Qwen3-4B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
MODELFILE ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MODELFILE for Qwen3-4B-GGUF
2
+ # Used by LM Studio, OpenWebUI, GPT4All, etc.
3
+
4
+ context_length: 32768
5
+ embedding: false
6
+ f16: cpu
7
+
8
+ # Chat template using ChatML (used by Qwen)
9
+ prompt_template: >-
10
+ <|im_start|>system
11
+ You are a helpful assistant.<|im_end|>
12
+ <|im_start|>user
13
+ {prompt}<|im_end|>
14
+ <|im_start|>assistant
15
+
16
+ # Stop sequences help end generation cleanly
17
+ stop: "<|im_end|>"
18
+ stop: "<|im_start|>"
19
+
20
+ # Default sampling
21
+ temperature: 0.6
22
+ top_p: 0.95
23
+ top_k: 20
24
+ min_p: 0.0
25
+ repeat_penalty: 1.1
Qwen3-4B-Q2_K/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q2_K
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q2_K** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 1.6G
22
+ - **Precision**: Q2_K
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Only on very weak hardware |
31
+ | **Speed** | 🚀 Fast |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Only on very weak hardware; poor reasoning. Avoid if possible. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q3_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q3_K_M
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_M** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 2.0G
22
+ - **Precision**: Q3_K_M
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Acceptable for basic chat on older CPUs. |
31
+ | **Speed** | 🚀 Fast |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Acceptable for basic chat on older CPUs. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q3_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q3_K_S
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_S** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 1.8G
22
+ - **Precision**: Q3_K_S
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Minimal viable for simple tasks. Avoid for reasoning. |
31
+ | **Speed** | 🚀 Fast |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Minimal viable for simple tasks. Avoid for reasoning. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q4_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q4_K_M
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_M** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 2.4G
22
+ - **Precision**: Q4_K_M
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Best speed/quality balance for most users. Ideal for laptops & general use. |
31
+ | **Speed** | 🚀 Fast |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Best speed/quality balance for most users. Ideal for laptops & general use. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q4_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q4_K_S
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_S** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 2.3G
22
+ - **Precision**: Q4_K_S
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Good for low-end devices |
31
+ | **Speed** | 🚀 Fast |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Good for low-end devices; decent performance. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q5_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q5_K_M
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_M** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 2.7G
22
+ - **Precision**: Q5_K_M
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
31
+ | **Speed** | 🐢 Medium |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q5_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q5_K_S
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_S** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 2.7G
22
+ - **Precision**: Q5_K_S
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Great for reasoning |
31
+ | **Speed** | 🐢 Medium |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Great for reasoning; slightly faster than Q5_K_M. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q6_K/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q6_K
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q6_K** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 3.1G
22
+ - **Precision**: Q6_K
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Excellent fidelity |
31
+ | **Speed** | 🐢 Medium |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-Q8_0/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - thinking-mode
10
+ base_model: Qwen/Qwen3-4B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3-4B-Q8_0
15
+
16
+ Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q8_0** level, derived from **f16** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: 4.0G
22
+ - **Precision**: Q8_0
23
+ - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | Highest quality without FP16 |
31
+ | **Speed** | 🐢 Medium |
32
+ | **RAM Required** | |
33
+ | **Recommendation** | Highest quality without FP16; perfect for accuracy-critical tasks. |
34
+
35
+ ## Prompt Template (ChatML)
36
+
37
+ This model uses the **ChatML** format used by Qwen:
38
+
39
+ ```text
40
+ <|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+ ```
46
+
47
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
48
+
49
+ ## Generation Parameters
50
+
51
+ Recommended defaults:
52
+
53
+ | Parameter | Value |
54
+ |---------|-------|
55
+ | Temperature | 0.6 |
56
+ | Top-P | 0.95 |
57
+ | Top-K | 20 |
58
+ | Min-P | 0.0 |
59
+ | Repeat Penalty | 1.1 |
60
+
61
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
62
+
63
+ ## Verification
64
+
65
+ Check integrity:
66
+
67
+ ```bash
68
+ sha256sum -c ../SHA256SUMS.txt
69
+ ```
70
+
71
+ ## Usage
72
+
73
+ Compatible with:
74
+ - [LM Studio](https://lmstudio.ai)
75
+ - [OpenWebUI](https://openwebui.com)
76
+ - [GPT4All](https://gpt4all.io)
77
+ - Directly via llama.cpp
78
+
79
+ ## License
80
+
81
+ Apache 2.0 – see base model for full terms.
Qwen3-4B-f16:Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5
3
+ size 1669499616
Qwen3-4B-f16:Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef
3
+ size 2075618016
Qwen3-4B-f16:Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913
3
+ size 1886997216
Qwen3-4B-f16:Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db
3
+ size 3306261216
Qwen3-4B-f16:Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4
3
+ size 4280405216
README.md CHANGED
@@ -13,53 +13,32 @@ author: geoffmunn
13
 
14
  # Qwen3-4B-GGUF
15
 
16
- This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio).
17
-
18
- ## Model Details
19
-
20
- - **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
21
- - **Conversion Tool**: [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
22
- - **Architecture**: Causal Language Model
23
- - **License**: apache-2.0 (see base model for details)
24
 
25
  ## Available Quantizations (from f16)
26
 
27
  The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
28
 
29
- | Level | Quality | Speed | Size Est. |
30
- |----------|--------|-------|----------|
31
- | Q4_K_S | Lower | Fast | ~3.0 GB |
32
- | Q4_K_M | Balanced | Fast | ~3.2 GB |
33
- | Q5_K_S | High | Slower | ~3.6 GB |
34
- | Q5_K_M | ✅✅ High | Medium | ~3.8 GB |
35
-
36
- > 💡 Tip: Use `Q5_K_M` for best quality/speed balance ideal for reasoning tasks.
37
-
38
- ## Generation Parameters
39
-
40
- | Parameter | Value | Meaning |
41
- |---------------|---------|--------|
42
- | **Temperature** | \`0.6\` | Controls randomness. A moderate value like \`0.6\` balances creativity and consistency. |
43
- | **Top-P (nucleus)** | \`0.95\` | Dynamically selects top tokens covering 95% probability mass. |
44
- | **Top-K** | \`20\` | Only considers top 20 most probable tokens. |
45
- | **Min-P** | \`0.0\` | Minimum threshold relative to top token. Compatible with advanced samplers. |
46
- | **Stream** | \`false\` | Set to \`true\` for real-time token streaming. |
47
-
48
- > 🛠 You can set these in your app depending on use case:
49
- > - Lower temp (~0.2–0.4) for coding or factual QA.
50
- > - Higher temp (~0.7–0.9) for brainstorming or creative writing.
51
-
52
- ```bash
53
- curl http://192.168.1.10:11434/api/generate -s -N -d '{
54
- "model": "hf.co/geoffmunn/Qwen3-4B-GGUF:Q4_K_M",
55
- "prompt": "A bat and a ball cost 1.10 together. The bat costs 1.00 more than the ball. How much does the ball cost?",
56
- "temperature": 0.6,
57
- "top_p": 0.95,
58
- "top_k": 20,
59
- "min_p": 0,
60
- "stream": false
61
- }' | jq
62
- ```
63
 
64
  ## Usage
65
 
@@ -67,19 +46,17 @@ Load this model using:
67
  - [OpenWebUI](https://openwebui.com)
68
  - [LM Studio](https://lmstudio.ai)
69
  - [GPT4All](https://gpt4all.io)
70
- - Or directly via \`llama.cpp\`:
71
 
72
- ```bash
73
- ./main -m Qwen3-4B-f16-Q5_K_M.gguf -p "Explain quantum entanglement simply."
74
- ```
75
 
76
  ## Verification
77
 
78
- This repo includes \`SHA256SUMS.txt\` to verify file integrity after download:
79
 
80
- ```bash
81
  sha256sum -c SHA256SUMS.txt
82
- ```
83
 
84
  ## Author
85
 
 
13
 
14
  # Qwen3-4B-GGUF
15
 
16
+ This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio, GPT4All).
 
 
 
 
 
 
 
17
 
18
  ## Available Quantizations (from f16)
19
 
20
  The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
21
 
22
+ | Level | Quality | Speed | Size Est. | Recommendation |
23
+ |----------|--------------|----------|-----------|----------------|
24
+ | Q2_K | Very Low | ⚡ Fastest | ~2.1 GB | Only on very weak hardware; poor reasoning. Avoid if possible. |
25
+ | Q3_K_S | Low | Fast | ~2.5 GB | Minimal viable for simple tasks. Avoid for reasoning. |
26
+ | Q3_K_M | Low-Medium | Fast | ~2.8 GB | Acceptable for basic chat on older CPUs. |
27
+ | Q4_K_S | Medium | 🚀 Fast | ~3.0 GB | Good for low-end devices; decent performance. |
28
+ | Q4_K_M | ✅ Balanced | 🚀 Fast | ~3.2 GB | Best speed/quality balance for most users. Ideal for laptops & general use. |
29
+ | Q5_K_S | High | 🐢 Medium | ~3.4 GB | Great for reasoning; slightly faster than Q5_K_M. |
30
+ | Q5_K_M | ✅✅ High | 🐢 Medium | ~3.6 GB | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
31
+ | Q6_K | 🔥 Near-FP16 | 🐌 Slow | ~4.2 GB | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
32
+ | Q8_0 | 🏆 Lossless* | 🐌 Slow | ~4.8 GB | Highest quality without FP16; perfect for accuracy-critical tasks. Recommended when full fidelity is needed. |
33
+
34
+ > 💡 **Recommendations by Use Case**
35
+ >
36
+ > - 💻 **Low-end CPU / Mac Mini / Old Laptop**: `Q4_K_M`
37
+ > - 🖥️ **Standard Laptop (M1/M2 Mac, i5/i7)**: `Q5_K_M` (best overall)
38
+ > - 🧠 **Reasoning, Coding, Math**: `Q6_K` or `Q8_0`
39
+ > - 🔍 **RAG, Retrieval, Precision Tasks**: `Q8_0`
40
+ > - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
41
+ > - 🛠️ **Development & Testing**: Always test across Q4_K_M Q8_0 to validate robustness.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Usage
44
 
 
46
  - [OpenWebUI](https://openwebui.com)
47
  - [LM Studio](https://lmstudio.ai)
48
  - [GPT4All](https://gpt4all.io)
49
+ - Or directly via \`llama.cpp\`
50
 
51
+ Each model includes its own `README.md` and `MODELFILE` for optimal configuration.
 
 
52
 
53
  ## Verification
54
 
55
+ Use \`SHA256SUMS.txt\` to verify file integrity:
56
 
57
+ \`\`\`bash
58
  sha256sum -c SHA256SUMS.txt
59
+ \`\`\`
60
 
61
  ## Author
62
 
SHA256SUMS.txt CHANGED
@@ -1,5 +1,9 @@
 
 
 
1
  94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45 Qwen3-4B-f16:Q4_K_M.gguf
2
  7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6 Qwen3-4B-f16:Q4_K_S.gguf
3
  39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d Qwen3-4B-f16:Q5_K_M.gguf
4
  0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7 Qwen3-4B-f16:Q5_K_S.gguf
5
- fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4 Qwen3-4B-q8_0.gguf
 
 
1
+ 5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5 Qwen3-4B-f16:Q2_K.gguf
2
+ 6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef Qwen3-4B-f16:Q3_K_M.gguf
3
+ 41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913 Qwen3-4B-f16:Q3_K_S.gguf
4
  94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45 Qwen3-4B-f16:Q4_K_M.gguf
5
  7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6 Qwen3-4B-f16:Q4_K_S.gguf
6
  39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d Qwen3-4B-f16:Q5_K_M.gguf
7
  0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7 Qwen3-4B-f16:Q5_K_S.gguf
8
+ 9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db Qwen3-4B-f16:Q6_K.gguf
9
+ fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4 Qwen3-4B-f16:Q8_0.gguf