update perplexity graph with full Q8_0 baseline

Files changed (2) hide show

README.md CHANGED Viewed

@@ -377,6 +377,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
 ## Quick Start
 You might need to override the template as needed. The original is here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
 You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
 ```bash
 # Example running Hybrid CPU+GPU(s) on ik_llama.cpp
@@ -396,7 +397,8 @@ You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
     --host 127.0.0.1 \
     --port 8080 \
     --no-mmap \
-    --jinja
 # Example running mainline llama.cpp
 # remove `-mla 3` from commands and you should be :gucci:

 ## Quick Start
 You might need to override the template as needed. The original is here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
 You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
+You may also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/)
 ```bash
 # Example running Hybrid CPU+GPU(s) on ik_llama.cpp
     --host 127.0.0.1 \
     --port 8080 \
     --no-mmap \
+    --jinja \
+    --special
 # Example running mainline llama.cpp
 # remove `-mla 3` from commands and you should be :gucci:

images/perplexity.png CHANGED Viewed