MLX[LMStudio] Enable Preserved Thinking Mode ?

#20

by kartavyabagga - opened Dec 28, 2025

Discussion

kartavyabagga

Dec 28, 2025

Explain how to enable Preserved Thinking Mode in MLX LMStudio

Sephyi

2 days ago

The honest answer: Preserved Thinking is not natively supported in MLX-LM or LMStudio at the moment. Here’s why and what you can do about it.

How Preserved Thinking actually works:

The model generates <think>...</think> blocks in its responses
On the next turn, the framework passes those thinking blocks back into the context unmodified (instead of stripping them)
The chat template Jinja controls this via the clear_thinking variable — when false, thinking content stays in the message history

Workaround for MLX-LM (manual):

You can simulate it by manually preserving the <think> content in your conversation history. If you're calling mlx_lm programmatically:

# After each response, keep the full output including <think> blocks
# in the messages list when building the next turn
messages.append({
    "role": "assistant",
    "content": full_response_with_think_blocks  # don't strip <think>...</think>
})
messages.append({
    "role": "user", 
    "content": next_user_message
})

The key is making sure the <think> content reaches the Jinja chat template. If mlx_lm is stripping it before templating, you'd need to patch the template or the generation code.

If you need real Preserved Thinking today, SGLang is your best bet for local deployment. It's the only self-hosted framework with official support.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment