Q4-Q6 chance?

#3
by cchance27 - opened

Any chance you could upload a Q4 to Q6 version? normally Q5ish is best for low vram but high coherence to Q8/FP16 weights

It doesnt have support in llama.cpp so id first need to implement it, though since there is no inference support yet i dont see a reason really

Sign up or log in to comment