Question about group size
#1
by
nephepritou
- opened
There is an opinion about group size 64 or 128 must give better quality. I wonder why 32 was chosen and how I (or anyone else) can reproduce same quantization process with different group size? Is it possible? Will it be a waste of time? Or is it just cost too much despite being slightly better?
Hi @nephepritou , yes, a quantization group size of 64 and 128 is possible using llmcompressor. 32 is chosen as it is tested to give higher quality, but larger quantized model size.
In addition, and particularly to this model, tensor-parallel-size can be 2 without enable-expert-parallel at the quantization group size of 32.