Quants without imatrix
Are the quants without IQ in the name unmolested by imatrix?
Cheers
unmolested
lol!
fyi - you can inspect the quants yourself. Click the first gguf file in a split eg:
and look for:
(Looks like they've all had imatrix calibration applied except for the Q8_0)
I see, thanks. Not sure what the Q5 has to offer over unsloth's though. Maybe I'm just worn out of downloading all these files lol. I mean, 4.5 works.
You'll want imatrix at lower sizes anyways
And at larger sizes they're not as important
I can't speak to difference between this and unsloth's, in most tests the models seem to trade blows in quality at different sizes, sometimes my smaller ones are better than his bigger, sometimes his smaller are better than my bigger, it's unfortunately pretty up in the air and when the models are so massive it's hard to get enough useful benchmarks in a timely manner
I'm after q5_k_m. I think that's probably around the level where I don't want imatrix anymore. I want to ask it esoteric questions like life advice, not for it to be optimized for normie things I don't mind asking to the APIs. Not that I know for sure it makes any difference. I found one with no imatrix, and I'm downloading part 5 of 19.
As an aside, smaller parts are less of a nightmare when the download fails (which seems to happen a lot these days - for me anyway).
The imatrix I use won't be optimized for anything in particular, the goal is to optimize across the board a bit rather than focus on anything in particular, I don't think you'll find it worse with my imatrix than without
I'm after q5_k_m. I think that's probably around the level where I don't want imatrix anymore. I want to ask it esoteric questions like life advice, not for it to be optimized for normie things
The importance matrix does not impart knowledge on the model or specialize the model to a particular domain. It's just picking which numbers are most suitable for more aggressive rounding based on the magnitude of their contribution to the final model output
The importance matrix does not impart knowledge on the model or specialize the model to a particular domain. It's just picking which numbers are most suitable for more aggressive rounding based on the magnitude of their contribution to the final model output
More aggressive rounding of things I might want, though? I thought the imatrix is optimized for whatever tests people used when making it. I'll just take my chances with whatever q5km spits out, knowing it's lower resolution than native but no favouritism involved.
More aggressive rounding of things I might want, though?
It's not that certain weights are for X subject and other weights are for Y subject. The goal of quantization (and imatrix calibration) is just to preserve the model's quality overall. It accomplishes this by massaging the rows and columns of the tensors to preserve the model's overall behaviour at a lower file size. So even if you were to use an imatrix calibration dataset that only included STEM-related text, it still wouldn't bias the meaningfully bias the model's performance in favor STEM subjects. That's just not how it works.
More aggressive rounding of things I might want, though?
It's not that certain weights are for X subject and other weights are for Y subject. The goal of quantization (and imatrix calibration) is just to preserve the model's quality overall. It accomplishes this by massaging the rows and columns of the tensors to preserve the model's overall behaviour at a lower file size. So even if you were to use an imatrix calibration dataset that only included STEM-related text, it still wouldn't bias the meaningfully bias the model's performance in favor STEM subjects. That's just not how it works.
I don't really get it to be honest. I mean, if arbitrary weights are at a lower bpw than the importantly-deemed ones then there's worse performance in terms of resolution?
Sonnet 4.5 says this (taken with a grain of salt as always):
More aggressive rounding of things I might want, though?
Depends. If you want random spelling mistakes, Chinese characters in the middle of words and syntax errors in code, then you should probably avoid iMatrix'd quants ;)
If you want to avoid the quants that appear coherent / fine for STEM but incoherent for what your after, you'll want to make sure they haven't over-compressed the ffn_down.weight.
That won't be a problem for Bart's q5_k_m.
Unsloth's would probably be fine as well, but I noticed they've started modifying/customizing the chat templates lately.
Depends. If you want random spelling mistakes, Chinese characters in the middle of words and syntax errors in code, then you should probably avoid iMatrix'd quants ;)
Unironically this is probably what I'm wanting in my use case here.
What I really need is bart/unsloth quants without the imatrix at q5km but it is what it is!
That's pretty niche but if you're experimenting, you'll want to grab this:
https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/BF16
Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script.
That's pretty niche but if you're experimenting, you'll want to grab this:
https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/BF16
Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script.
No joke, even if I could eventually download all that data, I haven't got enough disk space left for it ;)
Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script
you need to use llama-quantize, the convert_hf_to_gguf.py is to convert from safetensors to GGUF
🤦♂️ My bad
Follow up: the q5km without imatrix is working well, for what it's worth. Already got some quality advice out of it (system prompt: You are a psychoanalyst and advisor.)

