bartowski/zai-org_GLM-4.6-GGUF · Quants without imatrix

Rotating

Oct 3

Are the quants without IQ in the name unmolested by imatrix?

Cheers

gghfez

Oct 3

unmolested
lol!

fyi - you can inspect the quants yourself. Click the first gguf file in a split eg:

https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF/blob/main/zai-org_GLM-4.6-Q2_K_L/zai-org_GLM-4.6-Q2_K_L-00001-of-00004.gguf

and look for:

(Looks like they've all had imatrix calibration applied except for the Q8_0)

Rotating

Oct 3

I see, thanks. Not sure what the Q5 has to offer over unsloth's though. Maybe I'm just worn out of downloading all these files lol. I mean, 4.5 works.

Rotating changed discussion status to closed Oct 3

bartowski

Owner Oct 3

You'll want imatrix at lower sizes anyways

And at larger sizes they're not as important

I can't speak to difference between this and unsloth's, in most tests the models seem to trade blows in quality at different sizes, sometimes my smaller ones are better than his bigger, sometimes his smaller are better than my bigger, it's unfortunately pretty up in the air and when the models are so massive it's hard to get enough useful benchmarks in a timely manner

Rotating

Oct 3

I'm after q5_k_m. I think that's probably around the level where I don't want imatrix anymore. I want to ask it esoteric questions like life advice, not for it to be optimized for normie things I don't mind asking to the APIs. Not that I know for sure it makes any difference. I found one with no imatrix, and I'm downloading part 5 of 19.

As an aside, smaller parts are less of a nightmare when the download fails (which seems to happen a lot these days - for me anyway).

bartowski

Owner Oct 3

The imatrix I use won't be optimized for anything in particular, the goal is to optimize across the board a bit rather than focus on anything in particular, I don't think you'll find it worse with my imatrix than without

ddh0

Oct 3

•

edited Oct 3

I'm after q5_k_m. I think that's probably around the level where I don't want imatrix anymore. I want to ask it esoteric questions like life advice, not for it to be optimized for normie things

The importance matrix does not impart knowledge on the model or specialize the model to a particular domain. It's just picking which numbers are most suitable for more aggressive rounding based on the magnitude of their contribution to the final model output

Rotating

Oct 3

The importance matrix does not impart knowledge on the model or specialize the model to a particular domain. It's just picking which numbers are most suitable for more aggressive rounding based on the magnitude of their contribution to the final model output

More aggressive rounding of things I might want, though? I thought the imatrix is optimized for whatever tests people used when making it. I'll just take my chances with whatever q5km spits out, knowing it's lower resolution than native but no favouritism involved.

ddh0

Oct 3

More aggressive rounding of things I might want, though?

It's not that certain weights are for X subject and other weights are for Y subject. The goal of quantization (and imatrix calibration) is just to preserve the model's quality overall. It accomplishes this by massaging the rows and columns of the tensors to preserve the model's overall behaviour at a lower file size. So even if you were to use an imatrix calibration dataset that only included STEM-related text, it still wouldn't bias the meaningfully bias the model's performance in favor STEM subjects. That's just not how it works.

Rotating

Oct 3

More aggressive rounding of things I might want, though?

It's not that certain weights are for X subject and other weights are for Y subject. The goal of quantization (and imatrix calibration) is just to preserve the model's quality overall. It accomplishes this by massaging the rows and columns of the tensors to preserve the model's overall behaviour at a lower file size. So even if you were to use an imatrix calibration dataset that only included STEM-related text, it still wouldn't bias the meaningfully bias the model's performance in favor STEM subjects. That's just not how it works.

I don't really get it to be honest. I mean, if arbitrary weights are at a lower bpw than the importantly-deemed ones then there's worse performance in terms of resolution?

Sonnet 4.5 says this (taken with a grain of salt as always):

gghfez

Oct 4

@Rotating

More aggressive rounding of things I might want, though?

Depends. If you want random spelling mistakes, Chinese characters in the middle of words and syntax errors in code, then you should probably avoid iMatrix'd quants ;)

If you want to avoid the quants that appear coherent / fine for STEM but incoherent for what your after, you'll want to make sure they haven't over-compressed the ffn_down.weight.
That won't be a problem for Bart's q5_k_m.
Unsloth's would probably be fine as well, but I noticed they've started modifying/customizing the chat templates lately.

Rotating

Oct 4

Depends. If you want random spelling mistakes, Chinese characters in the middle of words and syntax errors in code, then you should probably avoid iMatrix'd quants ;)

Unironically this is probably what I'm wanting in my use case here.

What I really need is bart/unsloth quants without the imatrix at q5km but it is what it is!

gghfez

Oct 4

That's pretty niche but if you're experimenting, you'll want to grab this:

https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/BF16

Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script.

Rotating

Oct 4

That's pretty niche but if you're experimenting, you'll want to grab this:

https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/BF16

Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script.

No joke, even if I could eventually download all that data, I haven't got enough disk space left for it ;)

bartowski

Owner Oct 4

Then just quant it to q5_k yourself with the llama.cpp convert_hf_to_gguf.py script

you need to use llama-quantize, the convert_hf_to_gguf.py is to convert from safetensors to GGUF

gghfez

Oct 4

🤦‍♂️ My bad

Rotating

Oct 5

This comment has been hidden (marked as Off-Topic)

Rotating

Oct 5

•

edited Oct 5

Follow up: the q5km without imatrix is working well, for what it's worth. Already got some quality advice out of it (system prompt: You are a psychoanalyst and advisor.)