Ambiguity on interleaved thinking
Thanks for the incredible model!
I am a bit confused about the interleaved thinking, however. In the model card, it says this:
IMPORTANT: MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the ... format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the ... part, otherwise, the model's performance will be negatively affected.
However, in the chat template, it seems you intentionally discard all reasoning content, via:
{#- Only render reasoning_content if no user message follows -#}
and later:
{%- if reasoning_content and loop.index0 > ns.last_user_index -%}
So, should we just update the chat template to remove that conditional and always render the reasoning content, or should we manually inject think tags and reasoning content into the "content" attribute, or neither?
In normal use, simply retain the reasoning part directly in context – there’s no need to modify the logic of the chat template.
Note the difference between user turns and tool_call turns, where a user turn, or a single user-assistant interaction, can include multiple tool_call turns. During a new user turn, we do remove all the thinking of past user turns, the benefit is a reduction in context length across a multi-turn conversation. But within a user turn (before the final assistant response), we retain the thinking content in all the tool_call turns. You could see the logic around lines 97~100 in our chat_template.jinja.
In normal use, simply retain the reasoning part directly in context – there’s no need to modify the logic of the chat template.
Right, this is the question really. In SGLang, for example, if a message turn is: {"role": "assistant", "content": "hello", "reasoning_content":"thinking about something"}
That will be removed by this chat template in normal use, which means instead, users making requests should manually change the "content" like: {"role": "assistant", "content": "<think>thinking about something</think>hello"} (and add the user next turn) and omit the reasoning_content
Seems it would be simpler to just render the reasoning_content in the chat template and just allow passing "reasoning_content" (since this is how sglang returns it when using a reasoning parser).
Sorry for the confusion! The chat template works whether the reasoning content is in "content" or "reasoning_content". As I mentioned earlier, we only retain the thinking from the latest user turn (removing it from previous turns) to save context in multi-turn conversations.
@chih0321 Sorry if I'm being dense here, but if that's the case it seems everything is working as intended and I don't quite understand the situation described here:
https://x.com/MiniMax__AI/status/1985375617622454566?s=20
In the "Full interleaved thinking" image it seems to indicate that ALL rounds of reasoning content should be preserved.