update perplexity graph with full Q8_0 baseline
Browse files- README.md +3 -1
- images/perplexity.png +2 -2
README.md
CHANGED
|
@@ -377,6 +377,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 377 |
## Quick Start
|
| 378 |
You might need to override the template as needed. The original is here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
|
| 379 |
You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
|
|
|
|
| 380 |
|
| 381 |
```bash
|
| 382 |
# Example running Hybrid CPU+GPU(s) on ik_llama.cpp
|
|
@@ -396,7 +397,8 @@ You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
|
|
| 396 |
--host 127.0.0.1 \
|
| 397 |
--port 8080 \
|
| 398 |
--no-mmap \
|
| 399 |
-
--jinja
|
|
|
|
| 400 |
|
| 401 |
# Example running mainline llama.cpp
|
| 402 |
# remove `-mla 3` from commands and you should be :gucci:
|
|
|
|
| 377 |
## Quick Start
|
| 378 |
You might need to override the template as needed. The original is here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
|
| 379 |
You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
|
| 380 |
+
You may also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/)
|
| 381 |
|
| 382 |
```bash
|
| 383 |
# Example running Hybrid CPU+GPU(s) on ik_llama.cpp
|
|
|
|
| 397 |
--host 127.0.0.1 \
|
| 398 |
--port 8080 \
|
| 399 |
--no-mmap \
|
| 400 |
+
--jinja \
|
| 401 |
+
--special
|
| 402 |
|
| 403 |
# Example running mainline llama.cpp
|
| 404 |
# remove `-mla 3` from commands and you should be :gucci:
|
images/perplexity.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|