ubergarm commited on
Commit
e0e221d
·
1 Parent(s): a391d75

add references and more info almost done uploading

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  ---
15
 
16
  ## ~~imatrix~~ Quantization of moonshotai/Kimi-K2-Instruct-0905
17
- Converted with mainline llama.cpp and quantized with ik_llama.cpp and the one quant so far is testing working basic inference on both forks.
18
 
19
  ## Big Thanks
20
  Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
@@ -27,11 +27,14 @@ Finally, I appreciate all the support from [aifoundry.org](https://aifoundry.org
27
  This is an interesting one, currently only one available given the the original model design. The `Q8_0-Q4_0` is `q4_0` routed experts and `q8_0` all other tensors. It works on both ik_llama.cpp and mainline llama.cpp in limited testing. It does *not* use an imatrix!
28
 
29
  Compare with baseline perplexity of full size `Q8_0-Q4_0` 543.617 GiB (4.549 BPW)
30
-
31
  Final estimate: PPL = TODO
32
 
 
 
33
  ## References
34
  * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
35
  * [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
36
  * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
 
 
37
  * [llama.cpp PR#17069](https://github.com/ggml-org/llama.cpp/pull/17069#issuecomment-3500870165)
 
14
  ---
15
 
16
  ## ~~imatrix~~ Quantization of moonshotai/Kimi-K2-Instruct-0905
17
+ Converted with mainline llama.cpp PR#17069 and quantized with ik_llama.cpp. The one quant is working running inference on both forks in limited testing.
18
 
19
  ## Big Thanks
20
  Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
 
27
  This is an interesting one, currently only one available given the the original model design. The `Q8_0-Q4_0` is `q4_0` routed experts and `q8_0` all other tensors. It works on both ik_llama.cpp and mainline llama.cpp in limited testing. It does *not* use an imatrix!
28
 
29
  Compare with baseline perplexity of full size `Q8_0-Q4_0` 543.617 GiB (4.549 BPW)
 
30
  Final estimate: PPL = TODO
31
 
32
+ I may try to make a smaller one e.g. `smol-IQ1_KT` or `smol-IQ2_KS` or similar but not sure how well it will go given the original is QAT'd with `compressed-tensors` to *very similar* to q4_0 except using bf16 block scales instead of fp16 but same 32 weights per block size.
33
+
34
  ## References
35
  * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
36
  * [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
37
  * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
38
+ * [moonshotai/Kimi-K2-Thinking/discussions/2](https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/2)
39
+ * [vllm-project/compressed-tensors/issues/511](https://github.com/vllm-project/compressed-tensors/issues/511)
40
  * [llama.cpp PR#17069](https://github.com/ggml-org/llama.cpp/pull/17069#issuecomment-3500870165)