Q4-Q6 chance?
#3
by
cchance27
- opened
Any chance you could upload a Q4 to Q6 version? normally Q5ish is best for low vram but high coherence to Q8/FP16 weights
It doesnt have support in llama.cpp so id first need to implement it, though since there is no inference support yet i dont see a reason really