geoffmunn commited on Sep 17

Commit

c1b1cc8

verified ·

1 Parent(s): 509ef55

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload

Browse files

Files changed (18) hide show

.gitattributes +5 -0
MODELFILE +25 -0
Qwen3-4B-Q2_K/README.md +81 -0
Qwen3-4B-Q3_K_M/README.md +81 -0
Qwen3-4B-Q3_K_S/README.md +81 -0
Qwen3-4B-Q4_K_M/README.md +81 -0
Qwen3-4B-Q4_K_S/README.md +81 -0
Qwen3-4B-Q5_K_M/README.md +81 -0
Qwen3-4B-Q5_K_S/README.md +81 -0
Qwen3-4B-Q6_K/README.md +81 -0
Qwen3-4B-Q8_0/README.md +81 -0
Qwen3-4B-f16:Q2_K.gguf +3 -0
Qwen3-4B-f16:Q3_K_M.gguf +3 -0
Qwen3-4B-f16:Q3_K_S.gguf +3 -0
Qwen3-4B-f16:Q6_K.gguf +3 -0
Qwen3-4B-f16:Q8_0.gguf +3 -0
README.md +26 -49
SHA256SUMS.txt +5 -1

.gitattributes CHANGED Viewed

@@ -38,3 +38,8 @@ Qwen3-4B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text

 Qwen3-4B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-4B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-4B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-4B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-4B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-4B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

MODELFILE ADDED Viewed

	@@ -0,0 +1,25 @@

+# MODELFILE for Qwen3-4B-GGUF
+# Used by LM Studio, OpenWebUI, GPT4All, etc.
+context_length: 32768
+embedding: false
+f16: cpu
+# Chat template using ChatML (used by Qwen)
+prompt_template: >-
+     <|im_start|>system
+  You are a helpful assistant.<|im_end|>
+     <|im_start|>user
+  {prompt}<|im_end|>
+     <|im_start|>assistant
+# Stop sequences help end generation cleanly
+stop: "<|im_end|>"
+stop: "<|im_start|>"
+# Default sampling
+temperature: 0.6
+top_p: 0.95
+top_k: 20
+min_p: 0.0
+repeat_penalty: 1.1

Qwen3-4B-Q2_K/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q2_K
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q2_K** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 1.6G
+- **Precision**: Q2_K
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Only on very weak hardware |
+| **Speed** | 🚀 Fast |
+| **RAM Required** |  |
+| **Recommendation** | Only on very weak hardware; poor reasoning. Avoid if possible. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q3_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q3_K_M
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_M** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 2.0G
+- **Precision**: Q3_K_M
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Acceptable for basic chat on older CPUs. |
+| **Speed** | 🚀 Fast |
+| **RAM Required** |  |
+| **Recommendation** | Acceptable for basic chat on older CPUs. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q3_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q3_K_S
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q3_K_S** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 1.8G
+- **Precision**: Q3_K_S
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Minimal viable for simple tasks. Avoid for reasoning. |
+| **Speed** | 🚀 Fast |
+| **RAM Required** |  |
+| **Recommendation** | Minimal viable for simple tasks. Avoid for reasoning. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q4_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q4_K_M
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_M** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 2.4G
+- **Precision**: Q4_K_M
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Best speed/quality balance for most users. Ideal for laptops & general use. |
+| **Speed** | 🚀 Fast |
+| **RAM Required** |  |
+| **Recommendation** | Best speed/quality balance for most users. Ideal for laptops & general use. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q4_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q4_K_S
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q4_K_S** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 2.3G
+- **Precision**: Q4_K_S
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Good for low-end devices |
+| **Speed** | 🚀 Fast |
+| **RAM Required** |  |
+| **Recommendation** | Good for low-end devices; decent performance. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q5_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q5_K_M
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_M** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 2.7G
+- **Precision**: Q5_K_M
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
+| **Speed** | 🐢 Medium |
+| **RAM Required** |  |
+| **Recommendation** | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q5_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q5_K_S
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q5_K_S** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 2.7G
+- **Precision**: Q5_K_S
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Great for reasoning |
+| **Speed** | 🐢 Medium |
+| **RAM Required** |  |
+| **Recommendation** | Great for reasoning; slightly faster than Q5_K_M. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q6_K/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q6_K
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q6_K** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 3.1G
+- **Precision**: Q6_K
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Excellent fidelity |
+| **Speed** | 🐢 Medium |
+| **RAM Required** |  |
+| **Recommendation** | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-Q8_0/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - thinking-mode
+base_model: Qwen/Qwen3-4B
+author: geoffmunn
+---
+# Qwen3-4B-Q8_0
+Quantized version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) at **Q8_0** level, derived from **f16** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 4.0G
+- **Precision**: Q8_0
+- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | Highest quality without FP16 |
+| **Speed** | 🐢 Medium |
+| **RAM Required** |  |
+| **Recommendation** | Highest quality without FP16; perfect for accuracy-critical tasks. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-4B-f16:Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5
+size 1669499616

Qwen3-4B-f16:Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef
+size 2075618016

Qwen3-4B-f16:Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913
+size 1886997216

Qwen3-4B-f16:Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db
+size 3306261216

Qwen3-4B-f16:Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4
+size 4280405216

README.md CHANGED Viewed

@@ -13,53 +13,32 @@ author: geoffmunn
 # Qwen3-4B-GGUF
-This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio).
-## Model Details
-- **Base Model**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
-- **Conversion Tool**: [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
-- **Architecture**: Causal Language Model
-- **License**: apache-2.0 (see base model for details)
 ## Available Quantizations (from f16)
 The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
-| Level     | Quality | Speed | Size Est. |
-|----------|--------|-------|----------|
-| Q4_K_S   | Lower  | Fast  | ~3.0 GB  |
-| Q4_K_M   | ✅ Balanced | Fast | ~3.2 GB |
-| Q5_K_S   | High   | Slower | ~3.6 GB |
-| Q5_K_M   | ✅✅ High | Medium | ~3.8 GB |
-> 💡 Tip: Use `Q5_K_M` for best quality/speed balance — ideal for reasoning tasks.
-## Generation Parameters
-| Parameter     | Value   | Meaning |
-|---------------|---------|--------|
-| **Temperature** | \`0.6\` | Controls randomness. A moderate value like \`0.6\` balances creativity and consistency. |
-| **Top-P (nucleus)** | \`0.95\` | Dynamically selects top tokens covering 95% probability mass. |
-| **Top-K**       | \`20\`  | Only considers top 20 most probable tokens. |
-| **Min-P**       | \`0.0\` | Minimum threshold relative to top token. Compatible with advanced samplers. |
-| **Stream**      | \`false\` | Set to \`true\` for real-time token streaming. |
-> 🛠 You can set these in your app depending on use case:
-> - Lower temp (~0.2–0.4) for coding or factual QA.
-> - Higher temp (~0.7–0.9) for brainstorming or creative writing.
-```bash
-curl http://192.168.1.10:11434/api/generate -s -N -d '{
-  "model": "hf.co/geoffmunn/Qwen3-4B-GGUF:Q4_K_M",
-  "prompt": "A bat and a ball cost 1.10 together. The bat costs 1.00 more than the ball. How much does the ball cost?",
-  "temperature": 0.6,
-  "top_p": 0.95,
-  "top_k": 20,
-  "min_p": 0,
-  "stream": false
-}' | jq
-```
 ## Usage
@@ -67,19 +46,17 @@ Load this model using:
 - [OpenWebUI](https://openwebui.com)
 - [LM Studio](https://lmstudio.ai)
 - [GPT4All](https://gpt4all.io)
-- Or directly via \`llama.cpp\`:
-```bash
-./main -m Qwen3-4B-f16-Q5_K_M.gguf -p "Explain quantum entanglement simply."
-```
 ## Verification
-This repo includes \`SHA256SUMS.txt\` to verify file integrity after download:
-```bash
 sha256sum -c SHA256SUMS.txt
-```
 ## Author

 # Qwen3-4B-GGUF
+This is a **GGUF-quantized version** of the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** language model, converted for use with `llama.cpp` and compatible inference engines (e.g., OpenWebUI, LM Studio, GPT4All).
 ## Available Quantizations (from f16)
 The following variants were built starting from a **f16** base model, ensuring consistent quality across all versions.
+| Level     | Quality       | Speed     | Size Est. | Recommendation |
+|----------|--------------|----------|-----------|----------------|
+| Q2_K     | Very Low     | ⚡ Fastest | ~2.1 GB   | Only on very weak hardware; poor reasoning. Avoid if possible. |
+| Q3_K_S   | Low          | ⚡ Fast    | ~2.5 GB   | Minimal viable for simple tasks. Avoid for reasoning. |
+| Q3_K_M   | Low-Medium   | ⚡ Fast    | ~2.8 GB   | Acceptable for basic chat on older CPUs. |
+| Q4_K_S   | Medium       | 🚀 Fast    | ~3.0 GB   | Good for low-end devices; decent performance. |
+| Q4_K_M   | ✅ Balanced   | 🚀 Fast    | ~3.2 GB   | Best speed/quality balance for most users. Ideal for laptops & general use. |
+| Q5_K_S   | High         | 🐢 Medium  | ~3.4 GB   | Great for reasoning; slightly faster than Q5_K_M. |
+| Q5_K_M   | ✅✅ High     | 🐢 Medium  | ~3.6 GB   | Top choice for reasoning & coding. Recommended for desktops & strong laptops. |
+| Q6_K     | 🔥 Near-FP16 | 🐌 Slow    | ~4.2 GB   | Excellent fidelity; ideal for RAG, complex logic. Use if RAM allows. |
+| Q8_0     | 🏆 Lossless*  | 🐌 Slow    | ~4.8 GB   | Highest quality without FP16; perfect for accuracy-critical tasks. Recommended when full fidelity is needed. |
+> 💡 **Recommendations by Use Case**
+>
+> - 💻 **Low-end CPU / Mac Mini / Old Laptop**: `Q4_K_M`
+> - 🖥️ **Standard Laptop (M1/M2 Mac, i5/i7)**: `Q5_K_M` (best overall)
+> - 🧠 **Reasoning, Coding, Math**: `Q6_K` or `Q8_0`
+> - 🔍 **RAG, Retrieval, Precision Tasks**: `Q8_0`
+> - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
+> - 🛠️ **Development & Testing**: Always test across Q4_K_M → Q8_0 to validate robustness.
 ## Usage
 - [OpenWebUI](https://openwebui.com)
 - [LM Studio](https://lmstudio.ai)
 - [GPT4All](https://gpt4all.io)
+- Or directly via \`llama.cpp\`
+Each model includes its own `README.md` and `MODELFILE` for optimal configuration.
 ## Verification
+Use \`SHA256SUMS.txt\` to verify file integrity:
+\`\`\`bash
 sha256sum -c SHA256SUMS.txt
+\`\`\`
 ## Author

SHA256SUMS.txt CHANGED Viewed

@@ -1,5 +1,9 @@
 94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45  Qwen3-4B-f16:Q4_K_M.gguf
 7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6  Qwen3-4B-f16:Q4_K_S.gguf
 39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d  Qwen3-4B-f16:Q5_K_M.gguf
 0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7  Qwen3-4B-f16:Q5_K_S.gguf
-fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4  Qwen3-4B-q8_0.gguf

+5e3f5acb28ecd1d689294d2b574b0a4e07a3ee5d4c8f618def7c47b3df9f85c5  Qwen3-4B-f16:Q2_K.gguf
+6628f3ccc26094007223013611c02ecd39ffbbdf5a88568a977f45ff10aca4ef  Qwen3-4B-f16:Q3_K_M.gguf
+41724cf25bd576fe5ac57cb3df87409aebae164374ed81748dd2ee3f28b27913  Qwen3-4B-f16:Q3_K_S.gguf
 94a57f361a039e16250669511948ad87d4a52da94930a7a4b215db14f7b7da45  Qwen3-4B-f16:Q4_K_M.gguf
 7e6525fa15695cd2cd2d3112eacd38f775e4a7b9630518aa76f55506755937b6  Qwen3-4B-f16:Q4_K_S.gguf
 39399c8ec5a1d77b656b968161c4f4ea29ef51e63a0a9c4c657f4d379c5cec8d  Qwen3-4B-f16:Q5_K_M.gguf
 0df57f3ef40f374dac3263bb6bb567adf865c56e80f0aeed9ff09cd8e36ff5a7  Qwen3-4B-f16:Q5_K_S.gguf
+9cd01e523a1c16c9855c96af29a3ce8a0e44e762e5b9984d1397ee64bb96c8db  Qwen3-4B-f16:Q6_K.gguf
+fee4d9cdff7bf4e43f88efccb8b7ddeaabfd3da47f5030704195e4e67b68e2e4  Qwen3-4B-f16:Q8_0.gguf