Many chat template fixes
#6
pinned
by
danielhanchen
- opened
Took some time sorry, but please re-download the GGUFs - we managed to fix the following issues:
- Endless generations
- Gibberish
- Incorrect responses
- Multi-turn fails after 1 turn
Try:
./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
--jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384
danielhanchen
pinned discussion
problem still exists :(
ERNIE-4.5-21B-A3B-Thinking-Q2_K.gguf
ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf
etc.
@Mercyiris
Note you need to remove your old versions via:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
allow_patterns = ["*Q2_K_XL*"],
)
Try the below:
./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
--jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384
@Mercyiris
Note you need to remove your old versions via:from huggingface_hub import snapshot_download snapshot_download( repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF", local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF", allow_patterns = ["*Q2_K_XL*"], )Try the below:
./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \ --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384
Thank you, it works on!
.\build\bin\llama-cli.exe --model "unsloth\ERNIE-4.5-21B-A3B-Thinking-GGUF\ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf" --jinja -ngl 99 --ctx-size 16384 --temp 0.7 --top-k 40 --top-p 0.9 --min-p 0.05 --repeat-penalty 1.1 --repeat-last-n 256
Use this might be better to avoid this problem on heavily quantized model :)

