MLX[LMStudio] Enable Preserved Thinking Mode ?

#20
by kartavyabagga - opened

Explain how to enable Preserved Thinking Mode in MLX LMStudio

The honest answer: Preserved Thinking is not natively supported in MLX-LM or LMStudio at the moment. Here’s why and what you can do about it.

How Preserved Thinking actually works:

  1. The model generates <think>...</think> blocks in its responses
  2. On the next turn, the framework passes those thinking blocks back into the context unmodified (instead of stripping them)
  3. The chat template Jinja controls this via the clear_thinking variable — when false, thinking content stays in the message history

Workaround for MLX-LM (manual):

You can simulate it by manually preserving the <think> content in your conversation history. If you're calling mlx_lm programmatically:

# After each response, keep the full output including <think> blocks
# in the messages list when building the next turn
messages.append({
    "role": "assistant",
    "content": full_response_with_think_blocks  # don't strip <think>...</think>
})
messages.append({
    "role": "user", 
    "content": next_user_message
})

The key is making sure the <think> content reaches the Jinja chat template. If mlx_lm is stripping it before templating, you'd need to patch the template or the generation code.

If you need real Preserved Thinking today, SGLang is your best bet for local deployment. It's the only self-hosted framework with official support.

Sign up or log in to comment