Many chat template fixes

#6
by danielhanchen - opened
Unsloth AI org
β€’
edited Oct 2

Took some time sorry, but please re-download the GGUFs - we managed to fix the following issues:

  1. Endless generations
  2. Gibberish
  3. Incorrect responses
  4. Multi-turn fails after 1 turn

Try:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384

For example:
image

danielhanchen pinned discussion

problem still exists :(
ERNIE-4.5-21B-A3B-Thinking-Q2_K.gguf
ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf
etc.

Unsloth AI org
β€’
edited Oct 2

@Mercyiris
Note you need to remove your old versions via:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    allow_patterns = ["*Q2_K_XL*"],
)

Try the below:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384
Unsloth AI org

I retried it and it works fine

image

@Mercyiris
Note you need to remove your old versions via:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    allow_patterns = ["*Q2_K_XL*"],
)

Try the below:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384

Thank you, it works on!
.\build\bin\llama-cli.exe --model "unsloth\ERNIE-4.5-21B-A3B-Thinking-GGUF\ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf" --jinja -ngl 99 --ctx-size 16384 --temp 0.7 --top-k 40 --top-p 0.9 --min-p 0.05 --repeat-penalty 1.1 --repeat-last-n 256
Use this might be better to avoid this problem on heavily quantized model :)

Sign up or log in to comment