Humble request for a stable vLLM/SGLang deployment setup for DeepSeek-V3.2

#15

by burrowswang - opened Dec 2, 2025

Dec 2, 2025

First of all, thank you to the team and community for the amazing work on DeepSeek-V3.2.

I am currently working on deploying this model for my team using vLLM or SGLang. To avoid common pitfalls and ensure stability, I was wondering if anyone who has successfully deployed this model would be kind enough to share their working configuration?

I would be extremely grateful if you could share a "known-good" setup that I could use as a reference.

My Hardware Environment: [8x H200 141GB]

If possible, could you please provide:
Dependency Versions: The specific versions of vLLM (or SGLang), PyTorch, and Flash-Attention you are using (or the specific docker image tag).
Full Launch Command: The complete command line arguments (including Tensor Parallel, max sequence lengths, and any memory optimization flags).
Environment Variables: Any specific env vars that helped solve performance or compatibility issues.

Your guidance would be a huge timesaver for me and highly appreciated.
Thank you so much for your time and help!

zhangsan1234

Dec 2, 2025

me too

henrycg

Dec 2, 2025

me too

AIR-hl

Dec 2, 2025

me too

hebangwen

Dec 3, 2025

The model shares the same architecture with V3.2-Exp. I suppose we can refer to V3.2-Exp for more deployment info.

xiaoheixiaohei

Dec 3, 2025

me too

thesoum

Dec 3, 2025

How to get pass this error when calling the server? I started server following V3.2-Exp deployment guide and it started succesfully.

{'object': 'error',
'message': 'Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating',
'type': 'BadRequest',
'param': None,
'code': 400}

thesoum

Dec 3, 2025

Seems like using prompt correctly formatted in the README instead of messages worked.

Jaxe2a

Dec 4, 2025

How to get pass this error when calling the server? I started server following V3.2-Exp deployment guide and it started succesfully.

{'object': 'error',
'message': 'Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating',
'type': 'BadRequest',
'param': None,
'code': 400}

DeepSeek-V3.2 do not need chat_template. Follow this article https://docs.sglang.io/basic_usage/deepseek_v32.html

wkkunique

Dec 5, 2025

ref. https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html

ouyexie

Dec 5, 2025

I am using B200 with ARM CPU, and hosting with SGLang docker pull --platform linux/arm64 lmsysorg/sglang:latest docker image.

When request the model with OpenAI client, follow the instruction on https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html, e.g.,

        encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
        # messages -> string
        prompt = encode_messages(messages, **encode_config)
        response = await client.completions.create(
            model=canonical_name,
            prompt=prompt,
            top_p=top_p,
            temperature=temperature,
            max_tokens=max_tokens,
            stream=False, )

wsbao

Dec 11, 2025

AI help me convert the encode_messages function in encoding_dsv32.pyscript to Jinja chat template as shown below. The current Jinja template closely mirrors the original Python implementation and works well in most scenarios—including tool calls and structured output generation. However, it currently does not support enabling the default thinking_mode == "thinking". The model remains locked in thinking_mode == "chat" regardless of configuration.

Do you have any suggestions or workarounds to activate the “thinking” mode within this Jinja-based template?

{#- ============================================================================
    DeepSeek-V3.2 Chat Template
    Verbatim conversion from encoding_dsv32.py encode_messages function
    
    Python Function -> Jinja Macro Mapping:
    - to_json()                    -> | tojson (built-in filter)
    - tools_from_openai_format()   -> tools_from_openai_format()
    - tool_calls_from_openai_format() -> tool_calls_from_openai_format()
    - encode_arguments_to_dsml()   -> encode_arguments_to_dsml()
    - render_tools()               -> render_tools()
    - find_last_user_index()       -> find_last_user_index()
    - render_message()             -> render_message()
    - drop_thinking_messages()     -> (inline in main body)
    - encode_messages()            -> Main template body
    ============================================================================ -#}

{#- ============================================================================
    Constants (from Python module-level variables)
    
    bos_token: str = "<｜begin▁of▁sentence｜>"
    eos_token: str = "<｜end▁of▁sentence｜>"
    thinking_start_token: str = "<think>"
    thinking_end_token: str = "</think>"
    dsml_token: str = "｜DSML｜"
    ============================================================================ -#}
{%- set bos_token = "<｜begin▁of▁sentence｜>" -%}
{%- set eos_token = "<｜end▁of▁sentence｜>" -%}
{%- set thinking_start_token = "<think>" -%}
{%- set thinking_end_token = "</think>" -%}
{%- set dsml_token = "｜DSML｜" -%}

{#- ============================================================================
    Template strings (from Python module-level variables)
    
    system_msg_template: str = "{content}"
    user_msg_template: str = "<｜User｜>{content}<｜Assistant｜>"
    assistant_msg_template: str = "{reasoning}{content}{tool_calls}<｜end▁of▁sentence｜>"
    thinking_template = "{reasoning_content}"
    response_format_template: str = "## Response Format:..."
    tool_call_template: str = "<{dsml_token}invoke name=\"{name}\">..."
    tool_calls_template = "<{dsml_token}function_calls>..."
    tool_output_template: str = "\n<result>{content}</result>"
    ============================================================================ -#}
{%- set system_msg_template = "{content}" -%}
{%- set user_msg_template = "<｜User｜>{content}<｜Assistant｜>" -%}
{%- set assistant_msg_template = "{reasoning}{content}{tool_calls}<｜end▁of▁sentence｜>" -%}
{%- set thinking_template = "{reasoning_content}" -%}
{%- set response_format_template = "## Response Format:\n\nYou MUST strictly adhere to the following schema to reply:\n{schema}" -%}
{%- set tool_call_template = "<{dsml_token}invoke name=\"{name}\">\n{arguments}\n</{dsml_token}invoke>" -%}
{%- set tool_calls_template = "<{dsml_token}function_calls>\n{tool_calls}\n</{dsml_token}function_calls>" -%}
{%- set tool_output_template = "\n<result>{content}</result>" -%}

{#- ============================================================================
    TOOLS_SYSTEM_TEMPLATE (from Python constant)
    
    TOOLS_SYSTEM_TEMPLATE = """## Tools
    ...
    </functions>
    """
    Note: Python template ends with </functions>\n (trailing newline)
    ============================================================================ -#}
{%- set TOOLS_SYSTEM_TEMPLATE -%}
## Tools

You have access to a set of tools you can use to answer the user's question.
You can invoke functions by writing a "<{{ dsml_token }}function_calls>" block like the following as part of your reply to the user:
<{{ dsml_token }}function_calls>
<{{ dsml_token }}invoke name="$FUNCTION_NAME">
<{{ dsml_token }}parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</{{ dsml_token }}parameter>
...
</{{ dsml_token }}invoke>
<{{ dsml_token }}invoke name="$FUNCTION_NAME2">
...
</{{ dsml_token }}invoke>
</{{ dsml_token }}function_calls>

String and scalar parameters should be specified as is without any escaping or quotes, while lists and objects should use JSON format. The "string" attribute should be set to "true" for string type parameters and "false" for other types (numbers, booleans, arrays, objects).

If the thinking_mode is enabled, then after function results you should strongly consider outputting a thinking block. Here is an example:

<{{ dsml_token }}function_calls>
...
</{{ dsml_token }}function_calls>

<function_results>
...
</function_results>

{{ thinking_start_token }}...thinking about results{{ thinking_end_token }}

Here are the functions available in JSONSchema format:
<functions>
{tool_schemas}
</functions>
{%- endset -%}

{#- ============================================================================
    Default parameters for encode_messages()
    
    def encode_messages(messages, thinking_mode, context=None, 
                        drop_thinking=True, add_default_bos_token=True)
    ============================================================================ -#}
{%- if thinking_mode is not defined -%}
    {%- set thinking_mode = "thinking" -%}
{%- endif -%}

{%- if drop_thinking is not defined -%}
    {%- set drop_thinking = true -%}
{%- endif -%}

{%- if add_default_bos_token is not defined -%}
    {%- set add_default_bos_token = true -%}
{%- endif -%}

{#- ============================================================================
    Macro: tools_from_openai_format
    
    def tools_from_openai_format(tools):
        return [tool["function"] for tool in tools]
    ============================================================================ -#}
{%- macro tools_from_openai_format(tools) -%}
    {%- set ns = namespace(result=[]) -%}
    {%- for tool in tools -%}
        {%- if tool.function is defined -%}
            {%- set ns.result = ns.result + [tool.function] -%}
        {%- else -%}
            {%- set ns.result = ns.result + [tool] -%}
        {%- endif -%}
    {%- endfor -%}
    {{- ns.result | tojson -}}
{%- endmacro -%}

{#- ============================================================================
    Macro: tool_calls_from_openai_format
    
    def tool_calls_from_openai_format(tool_calls):
        return [
            {
                "name": tool_call["function"]["name"],
                "arguments": tool_call["function"]["arguments"],
            }
            for tool_call in tool_calls
        ]
    
    Note: In Jinja, we return the transformed list directly in render_tool_calls
    ============================================================================ -#}
{%- macro tool_calls_from_openai_format(tool_calls) -%}
    {%- set ns = namespace(result=[]) -%}
    {%- for tool_call in tool_calls -%}
        {%- if tool_call.function is defined -%}
            {%- set item = {"name": tool_call.function.name, "arguments": tool_call.function.arguments} -%}
        {%- else -%}
            {%- set item = {"name": tool_call.name, "arguments": tool_call.arguments} -%}
        {%- endif -%}
        {%- set ns.result = ns.result + [item] -%}
    {%- endfor -%}
    {{- ns.result | tojson -}}
{%- endmacro -%}

{#- ============================================================================
    Macro: encode_arguments_to_dsml
    
    def encode_arguments_to_dsml(tool_call: Dict[str, str]) -> str:
        p_dsml_template = '<{dsml_token}parameter name="{key}" string="{is_str}">{value}</{dsml_token}parameter>'
        P_dsml_strs = []
        arguments = json.loads(tool_call["arguments"])
        for k, v in arguments.items():
            p_dsml_str = p_dsml_template.format(
                dsml_token=dsml_token,
                key=k,
                is_str="true" if isinstance(v, str) else "false",
                value=v if isinstance(v, str) else to_json(v),
            )
            P_dsml_strs.append(p_dsml_str)
        return "\n".join(P_dsml_strs)
    
    Note: In Jinja, arguments is already parsed (mapping), not JSON string
    ============================================================================ -#}
{%- macro encode_arguments_to_dsml(arguments) -%}
    {%- set p_dsml_template = "<{dsml_token}parameter name=\"{key}\" string=\"{is_str}\">{value}</{dsml_token}parameter>" -%}
    {%- set ns = namespace(P_dsml_strs=[]) -%}
    {%- if arguments is mapping -%}
        {%- for k, v in arguments.items() -%}
            {%- if v is string -%}
                {%- set is_str = "true" -%}
                {%- set value = v -%}
            {%- else -%}
                {%- set is_str = "false" -%}
                {%- set value = v | tojson -%}
            {%- endif -%}
            {%- set p_dsml_str = p_dsml_template | replace("{dsml_token}", dsml_token) | replace("{key}", k) | replace("{is_str}", is_str) | replace("{value}", value) -%}
            {%- set ns.P_dsml_strs = ns.P_dsml_strs + [p_dsml_str] -%}
        {%- endfor -%}
    {%- endif -%}
    {{- ns.P_dsml_strs | join("\n") -}}
{%- endmacro -%}

{#- ============================================================================
    Macro: render_tools
    
    def render_tools(tools: List[Dict[str, Union[str, Dict[str, Any]]]]) -> str:
        tools_json = [to_json(t) for t in tools]
        return TOOLS_SYSTEM_TEMPLATE.format(
            tool_schemas="\n".join(tools_json),
            dsml_token=dsml_token,
            thinking_start_token=thinking_start_token,
            thinking_end_token=thinking_end_token,
        )
    ============================================================================ -#}
{%- macro render_tools(tools) -%}
    {#- tools_json = [to_json(t) for t in tools] -#}
    {%- set ns = namespace(tools_json=[]) -%}
    {%- for tool in tools -%}
        {%- if tool.function is defined -%}
            {%- set ns.tools_json = ns.tools_json + [tool.function | tojson] -%}
        {%- else -%}
            {%- set ns.tools_json = ns.tools_json + [tool | tojson] -%}
        {%- endif -%}
    {%- endfor -%}
    {#- return TOOLS_SYSTEM_TEMPLATE.format(tool_schemas="\n".join(tools_json), ...) -#}
{{- TOOLS_SYSTEM_TEMPLATE | replace("{tool_schemas}", ns.tools_json | join("\n")) }}
{% endmacro -%}

{#- ============================================================================
    Macro: find_last_user_index
    
    def find_last_user_index(messages: List[Dict[str, Any]]) -> int:
        last_user_index = -1
        for idx in range(len(messages)-1, -1, -1):
            if messages[idx].get("role") in ["user", "developer"]:
                last_user_index = idx
                break
        return last_user_index
    
    Note: Jinja iterates forward, so we update last_user_index on each match
    ============================================================================ -#}
{%- macro find_last_user_index(messages) -%}
    {%- set ns = namespace(last_user_index=-1) -%}
    {%- for msg in messages -%}
        {%- set role = msg.role if msg.role is defined else msg.get('role') -%}
        {%- if role in ['user', 'developer'] -%}
            {%- set ns.last_user_index = loop.index0 -%}
        {%- endif -%}
    {%- endfor -%}
    {{- ns.last_user_index -}}
{%- endmacro -%}

{#- ============================================================================
    Macro: render_tool_calls_content
    
    Helper macro that implements the tool_calls rendering portion of render_message:
    
    if tool_calls:
        tool_calls = [
            tool_call_template.format(
                dsml_token=dsml_token,
                name=tool_call.get("name"),
                arguments=encode_arguments_to_dsml(tool_call)
            )
            for tool_call in tool_calls
        ]
        tool_calls_content += "\n\n" + tool_calls_template.format(
            dsml_token=dsml_token,
            tool_calls="\n".join(tool_calls)
        )
    ============================================================================ -#}
{%- macro render_tool_calls_content(tool_calls) -%}
    {%- set ns = namespace(formatted_calls=[]) -%}
    {%- for tool_call in tool_calls -%}
        {#- Get name and arguments (handle OpenAI format) -#}
        {%- if tool_call.function is defined -%}
            {%- set name = tool_call.function.name -%}
            {%- set arguments = tool_call.function.arguments -%}
        {%- else -%}
            {%- set name = tool_call.name -%}
            {%- set arguments = tool_call.arguments -%}
        {%- endif -%}
        {#- tool_call_template.format(dsml_token, name, arguments=encode_arguments_to_dsml(tool_call)) -#}
        {%- set formatted_call = tool_call_template | replace("{dsml_token}", dsml_token) | replace("{name}", name) | replace("{arguments}", encode_arguments_to_dsml(arguments)) -%}
        {%- set ns.formatted_calls = ns.formatted_calls + [formatted_call] -%}
    {%- endfor -%}
    {#- tool_calls_template.format(dsml_token, tool_calls="\n".join(tool_calls)) -#}
{{- tool_calls_template | replace("{dsml_token}", dsml_token) | replace("{tool_calls}", ns.formatted_calls | join("\n")) -}}
{%- endmacro -%}

{#- ============================================================================
    Macro: render_message
    
    def render_message(index: int, messages: List[Dict[str, Any]], thinking_mode: str) -> str:
        assert 0 <= index < len(messages)
        assert thinking_mode in ["chat", "thinking"]
        
        prompt = ""
        msg = messages[index]
        last_user_idx = find_last_user_index(messages)
        
        role = msg.get("role")
        content = msg.get("content")
        tools = msg.get("tools")
        response_format = msg.get("response_format")
        tool_calls = msg.get("tool_calls")
        reasoning_content = msg.get("reasoning_content")
        
        if tools:
            tools = tools_from_openai_format(tools)
        if tool_calls:
            tool_calls = tool_calls_from_openai_format(tool_calls)
        
        if role == "system":
            ...
        elif role == "developer":
            ...
        elif role == "user":
            ...
        elif role == "tool":
            ...
        elif role == "assistant":
            ...
        
        return prompt
    ============================================================================ -#}
{%- macro render_message(index, messages, thinking_mode) -%}
    {#- msg = messages[index] -#}
    {%- set msg = messages[index] -%}
    
    {#- last_user_idx = find_last_user_index(messages) -#}
    {%- set last_user_idx = find_last_user_index(messages) | int -%}
    
    {#- Extract message fields with defaults -#}
    {%- set role = msg.role if msg.role is defined else msg.get('role') -%}
    {%- set content = msg.content if msg.content is defined else (msg.get('content', '') or '') -%}
    {%- set tools = msg.tools if msg.tools is defined else msg.get('tools', []) -%}
    {%- set response_format = msg.response_format if msg.response_format is defined else msg.get('response_format') -%}
    {%- set tool_calls = msg.tool_calls if msg.tool_calls is defined else msg.get('tool_calls', []) -%}
    {%- set reasoning_content = msg.reasoning_content if msg.reasoning_content is defined else (msg.get('reasoning_content', '') or '') -%}

    {#- ================================================================
        if role == "system":
            prompt += system_msg_template.format(content=content or "")
            if tools:
                prompt += "\n\n" + render_tools(tools)
            if response_format:
                prompt += "\n\n" + response_format_template.format(schema=to_json(response_format))
        ================================================================ -#}
    {%- if role == 'system' -%}
        {{- system_msg_template | replace("{content}", content or '') -}}
        {%- if tools -%}
            {{- "\n\n" ~ render_tools(tools) -}}
        {%- endif -%}
        {%- if response_format -%}
            {{- "\n\n" ~ response_format_template | replace("{schema}", response_format | tojson) -}}
        {%- endif -%}

    {#- ================================================================
        elif role == "developer":
            content_developer = ""
            if tools:
                content_developer += "\n\n" + render_tools(tools)
            if response_format:
                content_developer += "\n\n" + response_format_template.format(schema=to_json(response_format))
            content_developer += "\n\n# The user's message is: {}".format(content)
            prompt += user_msg_template.format(content=content_developer)
            if index == last_user_idx and thinking_mode == "thinking":
                prompt += thinking_start_token
            else:
                prompt += thinking_end_token
        ================================================================ -#}
    {%- elif role == 'developer' -%}
        {%- set ns = namespace(content_developer="") -%}
        {%- if tools -%}
            {%- set ns.content_developer = ns.content_developer ~ "\n\n" ~ render_tools(tools) -%}
        {%- endif -%}
        {%- if response_format -%}
            {%- set ns.content_developer = ns.content_developer ~ "\n\n" ~ response_format_template | replace("{schema}", response_format | tojson) -%}
        {%- endif -%}
        {%- set ns.content_developer = ns.content_developer ~ "\n\n# The user's message is: " ~ content -%}
        {{- user_msg_template | replace("{content}", ns.content_developer) -}}
        {%- if index == last_user_idx and thinking_mode == "thinking" -%}
            {{- thinking_start_token -}}
        {%- else -%}
            {{- thinking_end_token -}}
        {%- endif -%}

    {#- ================================================================
        elif role == "user":
            prompt += user_msg_template.format(content=content)
            if index == last_user_idx and thinking_mode == "thinking":
                prompt += thinking_start_token
            else:
                prompt += thinking_end_token
        ================================================================ -#}
    {%- elif role == 'user' -%}
        {{- user_msg_template | replace("{content}", content) -}}
        {%- if index == last_user_idx and thinking_mode == "thinking" -%}
            {{- thinking_start_token -}}
        {%- else -%}
            {{- thinking_end_token -}}
        {%- endif -%}

    {#- ================================================================
        elif role == "tool":
            prev_assistant_idx = index - 1
            assistant_msg = messages[prev_assistant_idx]
            while prev_assistant_idx >= 0 and assistant_msg.get("role") == "tool":
                prev_assistant_idx -= 1
                assistant_msg = messages[prev_assistant_idx]
            
            tool_call_order = index - prev_assistant_idx
            assistant_tool_calls = assistant_msg.get("tool_calls")
            
            if tool_call_order == 1:
                prompt += "\n\n<function_results>"
            
            prompt += tool_output_template.format(content=content)
            
            if tool_call_order == len(assistant_tool_calls):
                prompt += "\n</function_results>"
                if index >= last_user_idx and thinking_mode == "thinking":
                    prompt += "\n\n" + thinking_start_token
                else:
                    prompt += "\n\n" + thinking_end_token
        ================================================================ -#}
    {%- elif role == 'tool' -%}
        {#- Find previous assistant by scanning backwards -#}
        {%- set ns = namespace(prev_assistant_idx=-1) -%}
        {%- for i in range(index - 1, -1, -1) -%}
            {%- set check_role = messages[i].role if messages[i].role is defined else messages[i].get('role') -%}
            {%- if check_role != 'tool' and ns.prev_assistant_idx == -1 -%}
                {%- set ns.prev_assistant_idx = i -%}
            {%- endif -%}
        {%- endfor -%}
        
        {%- set tool_call_order = index - ns.prev_assistant_idx -%}
        {%- set assistant_msg = messages[ns.prev_assistant_idx] -%}
        {%- set assistant_tool_calls = assistant_msg.tool_calls if assistant_msg.tool_calls is defined else assistant_msg.get('tool_calls', []) -%}

        {%- if tool_call_order == 1 -%}
            {{- "\n\n<function_results>" -}}
        {%- endif -%}

        {{- tool_output_template | replace("{content}", content) -}}

        {%- if tool_call_order == (assistant_tool_calls | length) -%}
            {{- "\n</function_results>" -}}
            {%- if index >= last_user_idx and thinking_mode == "thinking" -%}
                {{- "\n\n" ~ thinking_start_token -}}
            {%- else -%}
                {{- "\n\n" ~ thinking_end_token -}}
            {%- endif -%}
        {%- endif -%}

    {#- ================================================================
        elif role == "assistant":
            thinking_part = ""
            tool_calls_content = ""
            
            if tool_calls:
                tool_calls = [tool_call_template.format(...) for tool_call in tool_calls]
                tool_calls_content += "\n\n" + tool_calls_template.format(...)
            
            summary_content = content or ""
            
            if thinking_mode == "thinking" and index > last_user_idx:
                thinking_part = thinking_template.format(reasoning_content=reasoning_content or "") + thinking_end_token
            
            prompt += assistant_msg_template.format(
                reasoning=thinking_part,
                content=summary_content,
                tool_calls=tool_calls_content,
            )
        ================================================================ -#}
    {%- elif role == 'assistant' -%}
        {%- set ns = namespace(thinking_part="", tool_calls_content="") -%}

        {#- Build tool_calls_content if tool_calls present -#}
        {%- if tool_calls -%}
            {%- set ns.tool_calls_content = "\n\n" ~ render_tool_calls_content(tool_calls) -%}
        {%- endif -%}

        {%- set summary_content = content or "" -%}

        {#- Build thinking_part if in thinking mode and after last user -#}
        {%- if thinking_mode == "thinking" and index > last_user_idx -%}
            {%- set ns.thinking_part = thinking_template | replace("{reasoning_content}", reasoning_content or "") ~ thinking_end_token -%}
        {%- endif -%}

        {#- Output using assistant_msg_template -#}
        {{- assistant_msg_template | replace("{reasoning}", ns.thinking_part) | replace("{content}", summary_content) | replace("{tool_calls}", ns.tool_calls_content) -}}

    {%- endif -%}
{%- endmacro -%}

{#- ============================================================================
    Main: encode_messages implementation
    
    def encode_messages(messages: List[Dict[str, Any]], thinking_mode: str, 
                        context: Optional[List[Dict[str, Any]]] = None, 
                        drop_thinking: bool = True, 
                        add_default_bos_token: bool = True) -> str:
        context = context if context else []
        full_messages = context + messages
        
        prompt = bos_token if add_default_bos_token and len(context) == 0 else ""
        
        if thinking_mode == "thinking" and drop_thinking:
            full_messages = drop_thinking_messages(full_messages)
        
        for idx in range(len(messages)):
            prompt += render_message(idx + len(context), full_messages, thinking_mode=thinking_mode)
        
        return prompt
    ============================================================================ -#}

{#- context = context if context else [] (not supported, assume no context) -#}
{#- full_messages = context + messages -#}
{%- set full_messages = messages -%}

{#- if thinking_mode == "thinking" and drop_thinking:
        full_messages = drop_thinking_messages(full_messages)
    
    Inline implementation of drop_thinking_messages():
    def drop_thinking_messages(messages, last_user_idx=None):
        messages_wo_thinking = []
        last_user_idx = find_last_user_index(messages) if last_user_idx is None else last_user_idx
        for idx, msg in enumerate(messages):
            role = msg.get("role")
            if role in ["user", "system", "tool"] or idx >= last_user_idx:
                messages_wo_thinking.append(msg)
            elif role == "assistant":
                msg_wo_thinking = copy.copy(msg)
                msg_wo_thinking.pop("reasoning_content", None)
                messages_wo_thinking.append(msg_wo_thinking)
        return messages_wo_thinking
-#}
{%- if thinking_mode == "thinking" and drop_thinking -%}
    {%- set orig_last_user_idx = find_last_user_index(full_messages) | int -%}
    {%- set ns_drop = namespace(messages_wo_thinking=[]) -%}
    {%- for msg in full_messages -%}
        {%- set role = msg.role if msg.role is defined else msg.get('role') -%}
        {%- if role in ['user', 'system', 'tool'] or loop.index0 >= orig_last_user_idx -%}
            {#- messages_wo_thinking.append(msg) -#}
            {%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg] -%}
        {%- elif role == 'assistant' -%}
            {#- msg_wo_thinking = copy.copy(msg); msg_wo_thinking.pop("reasoning_content", None) -#}
            {%- set msg_wo_thinking = {'role': 'assistant'} -%}
            {%- if msg.content is defined -%}
                {%- set _ = msg_wo_thinking.update({'content': msg.content}) -%}
            {%- elif msg.get('content') -%}
                {%- set _ = msg_wo_thinking.update({'content': msg.get('content')}) -%}
            {%- endif -%}
            {%- if msg.tool_calls is defined -%}
                {%- set _ = msg_wo_thinking.update({'tool_calls': msg.tool_calls}) -%}
            {%- elif msg.get('tool_calls') -%}
                {%- set _ = msg_wo_thinking.update({'tool_calls': msg.get('tool_calls')}) -%}
            {%- endif -%}
            {%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg_wo_thinking] -%}
        {%- else -%}
            {%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg] -%}
        {%- endif -%}
    {%- endfor -%}
    {%- set full_messages = ns_drop.messages_wo_thinking -%}
{%- endif -%}

{#- prompt = bos_token if add_default_bos_token and len(context) == 0 else "" -#}
{%- if add_default_bos_token -%}
    {{- bos_token -}}
{%- endif -%}

{#- for idx in range(len(messages)):
        prompt += render_message(idx + len(context), full_messages, thinking_mode=thinking_mode) -#}
{%- for msg in full_messages -%}
    {{- render_message(loop.index0, full_messages, thinking_mode) -}}
{%- endfor -%}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment