IQ2_KS

by gghfez - opened 19 days ago

19 days ago

Thanks for these quants!
Are you doing an IQ2_KS for this one (equivalent to https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/tree/main/IQ2_KS ?

ubergarm

Owner 18 days ago

I think this one should fit for you if you can fit the old one:

smol-IQ2_KS 270.133 GiB (2.261 BPW)

Final estimate: PPL = 2.9361 +/- 0.01451

gghfez

18 days ago

Thanks! It fits (smaller than the IQ2_KS).

print_timings] prompt eval time     =   35271.83 ms /  5656 tokens (    6.24 ms per token,   160.35 tokens per second)
print_timings] generation eval time =  191089.56 ms /  2945 runs   (   64.89 ms per token,    15.41 tokens per second)

Did the -ooae flag get removed from ik_llama recently?

gghfez

17 days ago

This is really good. The logic doesn't break down even at > 12k context and it seems to remember details from earlier in the chat.

It also doesn't just shift over to "You're absolutely right, I'm sorry" when I push back during problem solving.

Why is this one so good? Is the quant a lot better this time around (vs the IQ2_KS for the other Kimi models), or is the model just a lot smarter?

ubergarm

Owner 15 days ago

Did the -ooae flag get removed from ik_llama recently?

@gghfez

I can't find the exact PR, but ooae is now the default and can be disabled with --no-ooae i believe.

Why is this one so good? Is the quant a lot better this time around (vs the IQ2_KS for the other Kimi models), or is the model just a lot smarter?

why not both? haha... honestly I'm not sure, i'm still exploring how well the QAT actually translated over to various GGUF quantization types in terms of relative perplexity.

thanks for some of your comments in other discussions about --special and needing to fixup the <|im_end|> stop token!

gghfez

15 days ago

thanks for some of your comments in other discussions about --special and needing to fixup the <|im_end|> stop token!

No problem, that one burned me in February when I distilled R1 -> Mistral-Large and it wouldn't print the special tokens in llama.cpp

now the default

Thanks, yeah I noticed a lot of things are default now, my scripts are a lot smaller lol

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment