No thinking tags when it runs?
Getting an issue where thinking tags never appear which results in no markdowns, making the model unusable. I've only tested it with Kimi-K2-Thinking-UD-Q3_K_XL. This works fine with Deepseek R1, just for example. I don't know if this is an issue with the latest llama cpp or the GGUF.
I run with the following:
llama-server --model Kimi-K2-Thinking-UD-Q3_K_XL-00001-of-00010.gguf -ts 99,0 -fa on --temp 1.0 --min-p 0.01 -c 131072 --threads 38 -ngl 99 --n-cpu-moe 54
I have this problem too with GLM4.5 and GLM4.6 quantized by unsloth, but it's random. The </think> only appears 80% of the time, the longer the context, the lower the probability of success is. Not sure if it's related.
You have to do --special and then it comes up and then you'll see the think token. This is normal expected behavior
CC: @Disdrix @AliceThirty
Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.
The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?
Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
The only downside is that now it ends every answer with <|im_end|>.
Intended behavior when printing special tokens. <|im_end|> is a special token after all. You can set <|im_end|> as a stop string in openwebui.
Most front ends like ST, etc should support that.
Also, it seems to have an identity crisis and thinks it's Claude.
Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.
The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?
Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
We added it here: https://docs.unsloth.ai/models/kimi-k2-and-thinking-how-to-run-locally#thinking-tags
Another update. After some number of messages back and forth, markdowns fail again. It seems to be that --special did not fix the issue entirely.
It seems to happen repeatedly around 2000 tokens. I did a "write a story" initial prompt and "continue the story" a couple times.
Here, you can see it didn't even bother reasoning and went right to normal text generation. It then did a weird thing where it added a think token and repeated this section of story again exactly. Sometimes at this point it will actually reason, but won't markdown. After it finished, I did another "continue the story" and it did reason this time, but no markdown.





