No thinking tags when it runs?

#1
by Disdrix - opened

Getting an issue where thinking tags never appear which results in no markdowns, making the model unusable. I've only tested it with Kimi-K2-Thinking-UD-Q3_K_XL. This works fine with Deepseek R1, just for example. I don't know if this is an issue with the latest llama cpp or the GGUF.

I run with the following:
llama-server --model Kimi-K2-Thinking-UD-Q3_K_XL-00001-of-00010.gguf -ts 99,0 -fa on --temp 1.0 --min-p 0.01 -c 131072 --threads 38 -ngl 99 --n-cpu-moe 54

image

I have this problem too with GLM4.5 and GLM4.6 quantized by unsloth, but it's random. The </think> only appears 80% of the time, the longer the context, the lower the probability of success is. Not sure if it's related.

You have to do --special and then it comes up and then you'll see the think token. This is normal expected behavior

CC: @Disdrix @AliceThirty

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
image

The only downside is that now it ends every answer with <|im_end|>.

Intended behavior when printing special tokens. <|im_end|> is a special token after all. You can set <|im_end|> as a stop string in openwebui.

image

Most front ends like ST, etc should support that.

Also, it seems to have an identity crisis and thinks it's Claude.

Makes sense

Ok we'll add it to the guide @dsg22 thanks!

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
image

We added it here: https://docs.unsloth.ai/models/kimi-k2-and-thinking-how-to-run-locally#thinking-tags

Here is how to get rid of the im_end. Add this to the custom json.

{"prompt": "...", "stop": ["<|im_end|>"]}

image

Another update. After some number of messages back and forth, markdowns fail again. It seems to be that --special did not fix the issue entirely.

It seems to happen repeatedly around 2000 tokens. I did a "write a story" initial prompt and "continue the story" a couple times.

Here, you can see it didn't even bother reasoning and went right to normal text generation. It then did a weird thing where it added a think token and repeated this section of story again exactly. Sometimes at this point it will actually reason, but won't markdown. After it finished, I did another "continue the story" and it did reason this time, but no markdown.

image

image

image

Sign up or log in to comment