Question about group size

by nephepritou - opened 15 days ago

15 days ago

There is an opinion about group size 64 or 128 must give better quality. I wonder why 32 was chosen and how I (or anyone else) can reproduce same quantization process with different group size? Is it possible? Will it be a waste of time? Or is it just cost too much despite being slightly better?

cpatonn

cyankiwi org 14 days ago

Hi @nephepritou , yes, a quantization group size of 64 and 128 is possible using llmcompressor. 32 is chosen as it is tested to give higher quality, but larger quantized model size.

In addition, and particularly to this model, tensor-parallel-size can be 2 without enable-expert-parallel at the quantization group size of 32.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment