Commit
·
69e302d
1
Parent(s):
aadb36a
"Update README.md"
Browse files
README.md
CHANGED
|
@@ -40,19 +40,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
|
|
| 40 |
|
| 41 |
# Quantization variants
|
| 42 |
|
| 43 |
-
There is a bunch of quantized files available.
|
| 44 |
|
| 45 |
# Legacy quants
|
| 46 |
|
| 47 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
| 48 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
| 49 |
-
|
|
|
|
|
|
|
| 50 |
|
| 51 |
# K-quants
|
| 52 |
|
| 53 |
-
K-quants are
|
| 54 |
So, if possible, use K-quants.
|
| 55 |
-
With a Q6_K you
|
| 56 |
|
| 57 |
|
| 58 |
|
|
|
|
| 40 |
|
| 41 |
# Quantization variants
|
| 42 |
|
| 43 |
+
There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
|
| 44 |
|
| 45 |
# Legacy quants
|
| 46 |
|
| 47 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
| 48 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
| 49 |
+
## Note:
|
| 50 |
+
Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
|
| 51 |
+
(This mainly refers to Falcon 7b and Starcoder models)
|
| 52 |
|
| 53 |
# K-quants
|
| 54 |
|
| 55 |
+
K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
|
| 56 |
So, if possible, use K-quants.
|
| 57 |
+
With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
|
| 58 |
|
| 59 |
|
| 60 |
|