Many chat template fixes

pinned

by danielhanchen - opened Oct 1

Discussion

danielhanchen

Unsloth AI org Oct 1

•

edited Oct 2

Took some time sorry, but please re-download the GGUFs - we managed to fix the following issues:

Endless generations
Gibberish
Incorrect responses
Multi-turn fails after 1 turn

Try:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384

For example:

danielhanchen pinned discussion Oct 1

Mercyiris

Oct 2

•

edited Oct 2

problem still exists :(
ERNIE-4.5-21B-A3B-Thinking-Q2_K.gguf
ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf
etc.

danielhanchen

Unsloth AI org Oct 2

•

edited Oct 2

@Mercyiris
Note you need to remove your old versions via:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    allow_patterns = ["*Q2_K_XL*"],
)

Try the below:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384

danielhanchen

Unsloth AI org Oct 2

I retried it and it works fine

Mercyiris

Oct 2

•

edited Oct 2

@Mercyiris
Note you need to remove your old versions via:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    local_dir = "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
    allow_patterns = ["*Q2_K_XL*"],
)

Try the below:

./llama.cpp/llama-cli --model unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF/ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf \
    --jinja -ngl 99 --temp 0.6 --min-p 0.01 --ctx-size 16384

Thank you, it works on!
.\build\bin\llama-cli.exe --model "unsloth\ERNIE-4.5-21B-A3B-Thinking-GGUF\ERNIE-4.5-21B-A3B-Thinking-UD-Q2_K_XL.gguf" --jinja -ngl 99 --ctx-size 16384 --temp 0.7 --top-k 40 --top-p 0.9 --min-p 0.05 --repeat-penalty 1.1 --repeat-last-n 256
Use this might be better to avoid this problem on heavily quantized model :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment