No thinking tags when it runs?

by Disdrix - opened 1 day ago

1 day ago

Getting an issue where thinking tags never appear which results in no markdowns, making the model unusable. I've only tested it with Kimi-K2-Thinking-UD-Q3_K_XL. This works fine with Deepseek R1, just for example. I don't know if this is an issue with the latest llama cpp or the GGUF.

I run with the following:
llama-server --model Kimi-K2-Thinking-UD-Q3_K_XL-00001-of-00010.gguf -ts 99,0 -fa on --temp 1.0 --min-p 0.01 -c 131072 --threads 38 -ngl 99 --n-cpu-moe 54

AliceThirty

about 23 hours ago

•

edited about 22 hours ago

I have this problem too with GLM4.5 and GLM4.6 quantized by unsloth, but it's random. The </think> only appears 80% of the time, the longer the context, the lower the probability of success is. Not sure if it's related.

danielhanchen

Unsloth AI org about 22 hours ago

You have to do --special and then it comes up and then you'll see the think token. This is normal expected behavior

CC: @Disdrix @AliceThirty

dsg22

about 21 hours ago

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.

gghfez

about 20 hours ago

The only downside is that now it ends every answer with <|im_end|>.

Intended behavior when printing special tokens. <|im_end|> is a special token after all. You can set <|im_end|> as a stop string in openwebui.

Most front ends like ST, etc should support that.

Also, it seems to have an identity crisis and thinks it's Claude.

Makes sense

danielhanchen

Unsloth AI org about 19 hours ago

Ok we'll add it to the guide @dsg22 thanks!

danielhanchen

Unsloth AI org about 17 hours ago

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.

We added it here: https://docs.unsloth.ai/models/kimi-k2-and-thinking-how-to-run-locally#thinking-tags

Disdrix

about 11 hours ago

Here is how to get rid of the im_end. Add this to the custom json.

{"prompt": "...", "stop": ["<|im_end|>"]}

Disdrix

about 11 hours ago

Another update. After some number of messages back and forth, markdowns fail again. It seems to be that --special did not fix the issue entirely.

Disdrix

about 10 hours ago

•

edited about 10 hours ago

It seems to happen repeatedly around 2000 tokens. I did a "write a story" initial prompt and "continue the story" a couple times.

Here, you can see it didn't even bother reasoning and went right to normal text generation. It then did a weird thing where it added a think token and repeated this section of story again exactly. Sometimes at this point it will actually reason, but won't markdown. After it finished, I did another "continue the story" and it did reason this time, but no markdown.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment