Question about quanting the bf16

#3
by TPH441 - opened

If I were to download your bf16 and quantize the experts to Q4_0 and the rest to Q8_0, would that be lossless for the experts? Since the original model used INT4 for them.

Hmm hard to say - llama.cpp Q4_0 is different from INT4 since I think Q4_0 uses float16 scalers whilst INT4 uses bfloat16 scalers

Sign up or log in to comment