Absurd sizes.

#12

by ZeroWw - opened Aug 6, 2025

Aug 6, 2025

It's absurd that different quantizations give out the same 11 GB size.
I don't see the advantage.
Also:
the quantized versions don't work well in llama.cpp

shimmyshimmer

Unsloth AI org Aug 6, 2025

For quantizing, llama.cpp has limitations atm and I think they're working on fixing it. Then we can make proper quants for it with many different sizes :)

Could you explain what you mean they dont work well? Accuracy, speed?

phi1

Aug 7, 2025

•

edited Aug 7, 2025

(Note: Wrote this before I realised the more detailed discussion around the same points in https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/2). Besides, the sizes for fixed bit-width quantisations don't add up: A 20B model at 16 bits should be around 40 GB, and at 8 bits at least 20 GB. Edit: Just read in the other thread that it seems to be generated from an FP4 original. While the size calculations still apply, they could be completely irrelevant if there is not any more information than 4 bits per parameter anyway (and it is not obvious to me how any quant above 4 bit could make any sense (at least not information-wise - but maybe for utilizing specific hardware optimizations).

ZeroWw

Aug 9, 2025

yes. I understand... but Q2 size should be almost half of the Q4 size. (for example)

dugrema

Sep 18, 2025

At first I was disappointed at the small difference in sizes for all quants. The layers in the original openai files are already mostly in mxfp4 format. I just went and used the original and didn't bother with the unsloth gguf.

Then I decided to give the UD Q6_XL a try. It is awesome in my case compared to the original openai version. I only have a 3060ti with 8GB with an Intel i3-10100 cpu, saving up 1.8GB of vram helps.

I'm able to get 15-16 tokens/sec with a context of 16,384 and 15 layers in vram using llama.cpp, compared to 11-12 tokens/sec with a context of 8,196 and 14 layers in vram using the original.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment