add references and more info almost done uploading
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## ~~imatrix~~ Quantization of moonshotai/Kimi-K2-Instruct-0905
|
| 17 |
-
Converted with mainline llama.cpp and quantized with ik_llama.cpp
|
| 18 |
|
| 19 |
## Big Thanks
|
| 20 |
Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
|
|
@@ -27,11 +27,14 @@ Finally, I appreciate all the support from [aifoundry.org](https://aifoundry.org
|
|
| 27 |
This is an interesting one, currently only one available given the the original model design. The `Q8_0-Q4_0` is `q4_0` routed experts and `q8_0` all other tensors. It works on both ik_llama.cpp and mainline llama.cpp in limited testing. It does *not* use an imatrix!
|
| 28 |
|
| 29 |
Compare with baseline perplexity of full size `Q8_0-Q4_0` 543.617 GiB (4.549 BPW)
|
| 30 |
-
|
| 31 |
Final estimate: PPL = TODO
|
| 32 |
|
|
|
|
|
|
|
| 33 |
## References
|
| 34 |
* [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
|
| 35 |
* [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
|
| 36 |
* [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
|
|
|
|
|
|
|
| 37 |
* [llama.cpp PR#17069](https://github.com/ggml-org/llama.cpp/pull/17069#issuecomment-3500870165)
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## ~~imatrix~~ Quantization of moonshotai/Kimi-K2-Instruct-0905
|
| 17 |
+
Converted with mainline llama.cpp PR#17069 and quantized with ik_llama.cpp. The one quant is working running inference on both forks in limited testing.
|
| 18 |
|
| 19 |
## Big Thanks
|
| 20 |
Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
|
|
|
|
| 27 |
This is an interesting one, currently only one available given the the original model design. The `Q8_0-Q4_0` is `q4_0` routed experts and `q8_0` all other tensors. It works on both ik_llama.cpp and mainline llama.cpp in limited testing. It does *not* use an imatrix!
|
| 28 |
|
| 29 |
Compare with baseline perplexity of full size `Q8_0-Q4_0` 543.617 GiB (4.549 BPW)
|
|
|
|
| 30 |
Final estimate: PPL = TODO
|
| 31 |
|
| 32 |
+
I may try to make a smaller one e.g. `smol-IQ1_KT` or `smol-IQ2_KS` or similar but not sure how well it will go given the original is QAT'd with `compressed-tensors` to *very similar* to q4_0 except using bf16 block scales instead of fp16 but same 32 weights per block size.
|
| 33 |
+
|
| 34 |
## References
|
| 35 |
* [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
|
| 36 |
* [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
|
| 37 |
* [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
|
| 38 |
+
* [moonshotai/Kimi-K2-Thinking/discussions/2](https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/2)
|
| 39 |
+
* [vllm-project/compressed-tensors/issues/511](https://github.com/vllm-project/compressed-tensors/issues/511)
|
| 40 |
* [llama.cpp PR#17069](https://github.com/ggml-org/llama.cpp/pull/17069#issuecomment-3500870165)
|