Question about group size

#1
by nephepritou - opened

There is an opinion about group size 64 or 128 must give better quality. I wonder why 32 was chosen and how I (or anyone else) can reproduce same quantization process with different group size? Is it possible? Will it be a waste of time? Or is it just cost too much despite being slightly better?

cyankiwi org

Hi @nephepritou , yes, a quantization group size of 64 and 128 is possible using llmcompressor. 32 is chosen as it is tested to give higher quality, but larger quantized model size.

In addition, and particularly to this model, tensor-parallel-size can be 2 without enable-expert-parallel at the quantization group size of 32.

Sign up or log in to comment