Humble request for a stable vLLM/SGLang deployment setup for DeepSeek-V3.2
First of all, thank you to the team and community for the amazing work on DeepSeek-V3.2.
I am currently working on deploying this model for my team using vLLM or SGLang. To avoid common pitfalls and ensure stability, I was wondering if anyone who has successfully deployed this model would be kind enough to share their working configuration?
I would be extremely grateful if you could share a "known-good" setup that I could use as a reference.
My Hardware Environment: [8x H200 141GB]
If possible, could you please provide:
Dependency Versions: The specific versions of vLLM (or SGLang), PyTorch, and Flash-Attention you are using (or the specific docker image tag).
Full Launch Command: The complete command line arguments (including Tensor Parallel, max sequence lengths, and any memory optimization flags).
Environment Variables: Any specific env vars that helped solve performance or compatibility issues.
Your guidance would be a huge timesaver for me and highly appreciated.
Thank you so much for your time and help!
me too
me too
me too
me too
How to get pass this error when calling the server? I started server following V3.2-Exp deployment guide and it started succesfully.
{'object': 'error',
'message': 'Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating',
'type': 'BadRequest',
'param': None,
'code': 400}
Seems like using prompt correctly formatted in the README instead of messages worked.
How to get pass this error when calling the server? I started server following V3.2-Exp deployment guide and it started succesfully.
{'object': 'error',
'message': 'Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating',
'type': 'BadRequest',
'param': None,
'code': 400}
DeepSeek-V3.2 do not need chat_template. Follow this article https://docs.sglang.io/basic_usage/deepseek_v32.html
I am using B200 with ARM CPU, and hosting with SGLang docker pull --platform linux/arm64 lmsysorg/sglang:latest docker image.
When request the model with OpenAI client, follow the instruction on https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html, e.g.,
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -> string
prompt = encode_messages(messages, **encode_config)
response = await client.completions.create(
model=canonical_name,
prompt=prompt,
top_p=top_p,
temperature=temperature,
max_tokens=max_tokens,
stream=False, )
AI help me convert the encode_messages function in encoding_dsv32.pyscript to Jinja chat template as shown below. The current Jinja template closely mirrors the original Python implementation and works well in most scenarios—including tool calls and structured output generation. However, it currently does not support enabling the default thinking_mode == "thinking". The model remains locked in thinking_mode == "chat" regardless of configuration.
Do you have any suggestions or workarounds to activate the “thinking” mode within this Jinja-based template?
{#- ============================================================================
DeepSeek-V3.2 Chat Template
Verbatim conversion from encoding_dsv32.py encode_messages function
Python Function -> Jinja Macro Mapping:
- to_json() -> | tojson (built-in filter)
- tools_from_openai_format() -> tools_from_openai_format()
- tool_calls_from_openai_format() -> tool_calls_from_openai_format()
- encode_arguments_to_dsml() -> encode_arguments_to_dsml()
- render_tools() -> render_tools()
- find_last_user_index() -> find_last_user_index()
- render_message() -> render_message()
- drop_thinking_messages() -> (inline in main body)
- encode_messages() -> Main template body
============================================================================ -#}
{#- ============================================================================
Constants (from Python module-level variables)
bos_token: str = "<|begin▁of▁sentence|>"
eos_token: str = "<|end▁of▁sentence|>"
thinking_start_token: str = "<think>"
thinking_end_token: str = "</think>"
dsml_token: str = "|DSML|"
============================================================================ -#}
{%- set bos_token = "<|begin▁of▁sentence|>" -%}
{%- set eos_token = "<|end▁of▁sentence|>" -%}
{%- set thinking_start_token = "<think>" -%}
{%- set thinking_end_token = "</think>" -%}
{%- set dsml_token = "|DSML|" -%}
{#- ============================================================================
Template strings (from Python module-level variables)
system_msg_template: str = "{content}"
user_msg_template: str = "<|User|>{content}<|Assistant|>"
assistant_msg_template: str = "{reasoning}{content}{tool_calls}<|end▁of▁sentence|>"
thinking_template = "{reasoning_content}"
response_format_template: str = "## Response Format:..."
tool_call_template: str = "<{dsml_token}invoke name=\"{name}\">..."
tool_calls_template = "<{dsml_token}function_calls>..."
tool_output_template: str = "\n<result>{content}</result>"
============================================================================ -#}
{%- set system_msg_template = "{content}" -%}
{%- set user_msg_template = "<|User|>{content}<|Assistant|>" -%}
{%- set assistant_msg_template = "{reasoning}{content}{tool_calls}<|end▁of▁sentence|>" -%}
{%- set thinking_template = "{reasoning_content}" -%}
{%- set response_format_template = "## Response Format:\n\nYou MUST strictly adhere to the following schema to reply:\n{schema}" -%}
{%- set tool_call_template = "<{dsml_token}invoke name=\"{name}\">\n{arguments}\n</{dsml_token}invoke>" -%}
{%- set tool_calls_template = "<{dsml_token}function_calls>\n{tool_calls}\n</{dsml_token}function_calls>" -%}
{%- set tool_output_template = "\n<result>{content}</result>" -%}
{#- ============================================================================
TOOLS_SYSTEM_TEMPLATE (from Python constant)
TOOLS_SYSTEM_TEMPLATE = """## Tools
...
</functions>
"""
Note: Python template ends with </functions>\n (trailing newline)
============================================================================ -#}
{%- set TOOLS_SYSTEM_TEMPLATE -%}
## Tools
You have access to a set of tools you can use to answer the user's question.
You can invoke functions by writing a "<{{ dsml_token }}function_calls>" block like the following as part of your reply to the user:
<{{ dsml_token }}function_calls>
<{{ dsml_token }}invoke name="$FUNCTION_NAME">
<{{ dsml_token }}parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</{{ dsml_token }}parameter>
...
</{{ dsml_token }}invoke>
<{{ dsml_token }}invoke name="$FUNCTION_NAME2">
...
</{{ dsml_token }}invoke>
</{{ dsml_token }}function_calls>
String and scalar parameters should be specified as is without any escaping or quotes, while lists and objects should use JSON format. The "string" attribute should be set to "true" for string type parameters and "false" for other types (numbers, booleans, arrays, objects).
If the thinking_mode is enabled, then after function results you should strongly consider outputting a thinking block. Here is an example:
<{{ dsml_token }}function_calls>
...
</{{ dsml_token }}function_calls>
<function_results>
...
</function_results>
{{ thinking_start_token }}...thinking about results{{ thinking_end_token }}
Here are the functions available in JSONSchema format:
<functions>
{tool_schemas}
</functions>
{%- endset -%}
{#- ============================================================================
Default parameters for encode_messages()
def encode_messages(messages, thinking_mode, context=None,
drop_thinking=True, add_default_bos_token=True)
============================================================================ -#}
{%- if thinking_mode is not defined -%}
{%- set thinking_mode = "thinking" -%}
{%- endif -%}
{%- if drop_thinking is not defined -%}
{%- set drop_thinking = true -%}
{%- endif -%}
{%- if add_default_bos_token is not defined -%}
{%- set add_default_bos_token = true -%}
{%- endif -%}
{#- ============================================================================
Macro: tools_from_openai_format
def tools_from_openai_format(tools):
return [tool["function"] for tool in tools]
============================================================================ -#}
{%- macro tools_from_openai_format(tools) -%}
{%- set ns = namespace(result=[]) -%}
{%- for tool in tools -%}
{%- if tool.function is defined -%}
{%- set ns.result = ns.result + [tool.function] -%}
{%- else -%}
{%- set ns.result = ns.result + [tool] -%}
{%- endif -%}
{%- endfor -%}
{{- ns.result | tojson -}}
{%- endmacro -%}
{#- ============================================================================
Macro: tool_calls_from_openai_format
def tool_calls_from_openai_format(tool_calls):
return [
{
"name": tool_call["function"]["name"],
"arguments": tool_call["function"]["arguments"],
}
for tool_call in tool_calls
]
Note: In Jinja, we return the transformed list directly in render_tool_calls
============================================================================ -#}
{%- macro tool_calls_from_openai_format(tool_calls) -%}
{%- set ns = namespace(result=[]) -%}
{%- for tool_call in tool_calls -%}
{%- if tool_call.function is defined -%}
{%- set item = {"name": tool_call.function.name, "arguments": tool_call.function.arguments} -%}
{%- else -%}
{%- set item = {"name": tool_call.name, "arguments": tool_call.arguments} -%}
{%- endif -%}
{%- set ns.result = ns.result + [item] -%}
{%- endfor -%}
{{- ns.result | tojson -}}
{%- endmacro -%}
{#- ============================================================================
Macro: encode_arguments_to_dsml
def encode_arguments_to_dsml(tool_call: Dict[str, str]) -> str:
p_dsml_template = '<{dsml_token}parameter name="{key}" string="{is_str}">{value}</{dsml_token}parameter>'
P_dsml_strs = []
arguments = json.loads(tool_call["arguments"])
for k, v in arguments.items():
p_dsml_str = p_dsml_template.format(
dsml_token=dsml_token,
key=k,
is_str="true" if isinstance(v, str) else "false",
value=v if isinstance(v, str) else to_json(v),
)
P_dsml_strs.append(p_dsml_str)
return "\n".join(P_dsml_strs)
Note: In Jinja, arguments is already parsed (mapping), not JSON string
============================================================================ -#}
{%- macro encode_arguments_to_dsml(arguments) -%}
{%- set p_dsml_template = "<{dsml_token}parameter name=\"{key}\" string=\"{is_str}\">{value}</{dsml_token}parameter>" -%}
{%- set ns = namespace(P_dsml_strs=[]) -%}
{%- if arguments is mapping -%}
{%- for k, v in arguments.items() -%}
{%- if v is string -%}
{%- set is_str = "true" -%}
{%- set value = v -%}
{%- else -%}
{%- set is_str = "false" -%}
{%- set value = v | tojson -%}
{%- endif -%}
{%- set p_dsml_str = p_dsml_template | replace("{dsml_token}", dsml_token) | replace("{key}", k) | replace("{is_str}", is_str) | replace("{value}", value) -%}
{%- set ns.P_dsml_strs = ns.P_dsml_strs + [p_dsml_str] -%}
{%- endfor -%}
{%- endif -%}
{{- ns.P_dsml_strs | join("\n") -}}
{%- endmacro -%}
{#- ============================================================================
Macro: render_tools
def render_tools(tools: List[Dict[str, Union[str, Dict[str, Any]]]]) -> str:
tools_json = [to_json(t) for t in tools]
return TOOLS_SYSTEM_TEMPLATE.format(
tool_schemas="\n".join(tools_json),
dsml_token=dsml_token,
thinking_start_token=thinking_start_token,
thinking_end_token=thinking_end_token,
)
============================================================================ -#}
{%- macro render_tools(tools) -%}
{#- tools_json = [to_json(t) for t in tools] -#}
{%- set ns = namespace(tools_json=[]) -%}
{%- for tool in tools -%}
{%- if tool.function is defined -%}
{%- set ns.tools_json = ns.tools_json + [tool.function | tojson] -%}
{%- else -%}
{%- set ns.tools_json = ns.tools_json + [tool | tojson] -%}
{%- endif -%}
{%- endfor -%}
{#- return TOOLS_SYSTEM_TEMPLATE.format(tool_schemas="\n".join(tools_json), ...) -#}
{{- TOOLS_SYSTEM_TEMPLATE | replace("{tool_schemas}", ns.tools_json | join("\n")) }}
{% endmacro -%}
{#- ============================================================================
Macro: find_last_user_index
def find_last_user_index(messages: List[Dict[str, Any]]) -> int:
last_user_index = -1
for idx in range(len(messages)-1, -1, -1):
if messages[idx].get("role") in ["user", "developer"]:
last_user_index = idx
break
return last_user_index
Note: Jinja iterates forward, so we update last_user_index on each match
============================================================================ -#}
{%- macro find_last_user_index(messages) -%}
{%- set ns = namespace(last_user_index=-1) -%}
{%- for msg in messages -%}
{%- set role = msg.role if msg.role is defined else msg.get('role') -%}
{%- if role in ['user', 'developer'] -%}
{%- set ns.last_user_index = loop.index0 -%}
{%- endif -%}
{%- endfor -%}
{{- ns.last_user_index -}}
{%- endmacro -%}
{#- ============================================================================
Macro: render_tool_calls_content
Helper macro that implements the tool_calls rendering portion of render_message:
if tool_calls:
tool_calls = [
tool_call_template.format(
dsml_token=dsml_token,
name=tool_call.get("name"),
arguments=encode_arguments_to_dsml(tool_call)
)
for tool_call in tool_calls
]
tool_calls_content += "\n\n" + tool_calls_template.format(
dsml_token=dsml_token,
tool_calls="\n".join(tool_calls)
)
============================================================================ -#}
{%- macro render_tool_calls_content(tool_calls) -%}
{%- set ns = namespace(formatted_calls=[]) -%}
{%- for tool_call in tool_calls -%}
{#- Get name and arguments (handle OpenAI format) -#}
{%- if tool_call.function is defined -%}
{%- set name = tool_call.function.name -%}
{%- set arguments = tool_call.function.arguments -%}
{%- else -%}
{%- set name = tool_call.name -%}
{%- set arguments = tool_call.arguments -%}
{%- endif -%}
{#- tool_call_template.format(dsml_token, name, arguments=encode_arguments_to_dsml(tool_call)) -#}
{%- set formatted_call = tool_call_template | replace("{dsml_token}", dsml_token) | replace("{name}", name) | replace("{arguments}", encode_arguments_to_dsml(arguments)) -%}
{%- set ns.formatted_calls = ns.formatted_calls + [formatted_call] -%}
{%- endfor -%}
{#- tool_calls_template.format(dsml_token, tool_calls="\n".join(tool_calls)) -#}
{{- tool_calls_template | replace("{dsml_token}", dsml_token) | replace("{tool_calls}", ns.formatted_calls | join("\n")) -}}
{%- endmacro -%}
{#- ============================================================================
Macro: render_message
def render_message(index: int, messages: List[Dict[str, Any]], thinking_mode: str) -> str:
assert 0 <= index < len(messages)
assert thinking_mode in ["chat", "thinking"]
prompt = ""
msg = messages[index]
last_user_idx = find_last_user_index(messages)
role = msg.get("role")
content = msg.get("content")
tools = msg.get("tools")
response_format = msg.get("response_format")
tool_calls = msg.get("tool_calls")
reasoning_content = msg.get("reasoning_content")
if tools:
tools = tools_from_openai_format(tools)
if tool_calls:
tool_calls = tool_calls_from_openai_format(tool_calls)
if role == "system":
...
elif role == "developer":
...
elif role == "user":
...
elif role == "tool":
...
elif role == "assistant":
...
return prompt
============================================================================ -#}
{%- macro render_message(index, messages, thinking_mode) -%}
{#- msg = messages[index] -#}
{%- set msg = messages[index] -%}
{#- last_user_idx = find_last_user_index(messages) -#}
{%- set last_user_idx = find_last_user_index(messages) | int -%}
{#- Extract message fields with defaults -#}
{%- set role = msg.role if msg.role is defined else msg.get('role') -%}
{%- set content = msg.content if msg.content is defined else (msg.get('content', '') or '') -%}
{%- set tools = msg.tools if msg.tools is defined else msg.get('tools', []) -%}
{%- set response_format = msg.response_format if msg.response_format is defined else msg.get('response_format') -%}
{%- set tool_calls = msg.tool_calls if msg.tool_calls is defined else msg.get('tool_calls', []) -%}
{%- set reasoning_content = msg.reasoning_content if msg.reasoning_content is defined else (msg.get('reasoning_content', '') or '') -%}
{#- ================================================================
if role == "system":
prompt += system_msg_template.format(content=content or "")
if tools:
prompt += "\n\n" + render_tools(tools)
if response_format:
prompt += "\n\n" + response_format_template.format(schema=to_json(response_format))
================================================================ -#}
{%- if role == 'system' -%}
{{- system_msg_template | replace("{content}", content or '') -}}
{%- if tools -%}
{{- "\n\n" ~ render_tools(tools) -}}
{%- endif -%}
{%- if response_format -%}
{{- "\n\n" ~ response_format_template | replace("{schema}", response_format | tojson) -}}
{%- endif -%}
{#- ================================================================
elif role == "developer":
content_developer = ""
if tools:
content_developer += "\n\n" + render_tools(tools)
if response_format:
content_developer += "\n\n" + response_format_template.format(schema=to_json(response_format))
content_developer += "\n\n# The user's message is: {}".format(content)
prompt += user_msg_template.format(content=content_developer)
if index == last_user_idx and thinking_mode == "thinking":
prompt += thinking_start_token
else:
prompt += thinking_end_token
================================================================ -#}
{%- elif role == 'developer' -%}
{%- set ns = namespace(content_developer="") -%}
{%- if tools -%}
{%- set ns.content_developer = ns.content_developer ~ "\n\n" ~ render_tools(tools) -%}
{%- endif -%}
{%- if response_format -%}
{%- set ns.content_developer = ns.content_developer ~ "\n\n" ~ response_format_template | replace("{schema}", response_format | tojson) -%}
{%- endif -%}
{%- set ns.content_developer = ns.content_developer ~ "\n\n# The user's message is: " ~ content -%}
{{- user_msg_template | replace("{content}", ns.content_developer) -}}
{%- if index == last_user_idx and thinking_mode == "thinking" -%}
{{- thinking_start_token -}}
{%- else -%}
{{- thinking_end_token -}}
{%- endif -%}
{#- ================================================================
elif role == "user":
prompt += user_msg_template.format(content=content)
if index == last_user_idx and thinking_mode == "thinking":
prompt += thinking_start_token
else:
prompt += thinking_end_token
================================================================ -#}
{%- elif role == 'user' -%}
{{- user_msg_template | replace("{content}", content) -}}
{%- if index == last_user_idx and thinking_mode == "thinking" -%}
{{- thinking_start_token -}}
{%- else -%}
{{- thinking_end_token -}}
{%- endif -%}
{#- ================================================================
elif role == "tool":
prev_assistant_idx = index - 1
assistant_msg = messages[prev_assistant_idx]
while prev_assistant_idx >= 0 and assistant_msg.get("role") == "tool":
prev_assistant_idx -= 1
assistant_msg = messages[prev_assistant_idx]
tool_call_order = index - prev_assistant_idx
assistant_tool_calls = assistant_msg.get("tool_calls")
if tool_call_order == 1:
prompt += "\n\n<function_results>"
prompt += tool_output_template.format(content=content)
if tool_call_order == len(assistant_tool_calls):
prompt += "\n</function_results>"
if index >= last_user_idx and thinking_mode == "thinking":
prompt += "\n\n" + thinking_start_token
else:
prompt += "\n\n" + thinking_end_token
================================================================ -#}
{%- elif role == 'tool' -%}
{#- Find previous assistant by scanning backwards -#}
{%- set ns = namespace(prev_assistant_idx=-1) -%}
{%- for i in range(index - 1, -1, -1) -%}
{%- set check_role = messages[i].role if messages[i].role is defined else messages[i].get('role') -%}
{%- if check_role != 'tool' and ns.prev_assistant_idx == -1 -%}
{%- set ns.prev_assistant_idx = i -%}
{%- endif -%}
{%- endfor -%}
{%- set tool_call_order = index - ns.prev_assistant_idx -%}
{%- set assistant_msg = messages[ns.prev_assistant_idx] -%}
{%- set assistant_tool_calls = assistant_msg.tool_calls if assistant_msg.tool_calls is defined else assistant_msg.get('tool_calls', []) -%}
{%- if tool_call_order == 1 -%}
{{- "\n\n<function_results>" -}}
{%- endif -%}
{{- tool_output_template | replace("{content}", content) -}}
{%- if tool_call_order == (assistant_tool_calls | length) -%}
{{- "\n</function_results>" -}}
{%- if index >= last_user_idx and thinking_mode == "thinking" -%}
{{- "\n\n" ~ thinking_start_token -}}
{%- else -%}
{{- "\n\n" ~ thinking_end_token -}}
{%- endif -%}
{%- endif -%}
{#- ================================================================
elif role == "assistant":
thinking_part = ""
tool_calls_content = ""
if tool_calls:
tool_calls = [tool_call_template.format(...) for tool_call in tool_calls]
tool_calls_content += "\n\n" + tool_calls_template.format(...)
summary_content = content or ""
if thinking_mode == "thinking" and index > last_user_idx:
thinking_part = thinking_template.format(reasoning_content=reasoning_content or "") + thinking_end_token
prompt += assistant_msg_template.format(
reasoning=thinking_part,
content=summary_content,
tool_calls=tool_calls_content,
)
================================================================ -#}
{%- elif role == 'assistant' -%}
{%- set ns = namespace(thinking_part="", tool_calls_content="") -%}
{#- Build tool_calls_content if tool_calls present -#}
{%- if tool_calls -%}
{%- set ns.tool_calls_content = "\n\n" ~ render_tool_calls_content(tool_calls) -%}
{%- endif -%}
{%- set summary_content = content or "" -%}
{#- Build thinking_part if in thinking mode and after last user -#}
{%- if thinking_mode == "thinking" and index > last_user_idx -%}
{%- set ns.thinking_part = thinking_template | replace("{reasoning_content}", reasoning_content or "") ~ thinking_end_token -%}
{%- endif -%}
{#- Output using assistant_msg_template -#}
{{- assistant_msg_template | replace("{reasoning}", ns.thinking_part) | replace("{content}", summary_content) | replace("{tool_calls}", ns.tool_calls_content) -}}
{%- endif -%}
{%- endmacro -%}
{#- ============================================================================
Main: encode_messages implementation
def encode_messages(messages: List[Dict[str, Any]], thinking_mode: str,
context: Optional[List[Dict[str, Any]]] = None,
drop_thinking: bool = True,
add_default_bos_token: bool = True) -> str:
context = context if context else []
full_messages = context + messages
prompt = bos_token if add_default_bos_token and len(context) == 0 else ""
if thinking_mode == "thinking" and drop_thinking:
full_messages = drop_thinking_messages(full_messages)
for idx in range(len(messages)):
prompt += render_message(idx + len(context), full_messages, thinking_mode=thinking_mode)
return prompt
============================================================================ -#}
{#- context = context if context else [] (not supported, assume no context) -#}
{#- full_messages = context + messages -#}
{%- set full_messages = messages -%}
{#- if thinking_mode == "thinking" and drop_thinking:
full_messages = drop_thinking_messages(full_messages)
Inline implementation of drop_thinking_messages():
def drop_thinking_messages(messages, last_user_idx=None):
messages_wo_thinking = []
last_user_idx = find_last_user_index(messages) if last_user_idx is None else last_user_idx
for idx, msg in enumerate(messages):
role = msg.get("role")
if role in ["user", "system", "tool"] or idx >= last_user_idx:
messages_wo_thinking.append(msg)
elif role == "assistant":
msg_wo_thinking = copy.copy(msg)
msg_wo_thinking.pop("reasoning_content", None)
messages_wo_thinking.append(msg_wo_thinking)
return messages_wo_thinking
-#}
{%- if thinking_mode == "thinking" and drop_thinking -%}
{%- set orig_last_user_idx = find_last_user_index(full_messages) | int -%}
{%- set ns_drop = namespace(messages_wo_thinking=[]) -%}
{%- for msg in full_messages -%}
{%- set role = msg.role if msg.role is defined else msg.get('role') -%}
{%- if role in ['user', 'system', 'tool'] or loop.index0 >= orig_last_user_idx -%}
{#- messages_wo_thinking.append(msg) -#}
{%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg] -%}
{%- elif role == 'assistant' -%}
{#- msg_wo_thinking = copy.copy(msg); msg_wo_thinking.pop("reasoning_content", None) -#}
{%- set msg_wo_thinking = {'role': 'assistant'} -%}
{%- if msg.content is defined -%}
{%- set _ = msg_wo_thinking.update({'content': msg.content}) -%}
{%- elif msg.get('content') -%}
{%- set _ = msg_wo_thinking.update({'content': msg.get('content')}) -%}
{%- endif -%}
{%- if msg.tool_calls is defined -%}
{%- set _ = msg_wo_thinking.update({'tool_calls': msg.tool_calls}) -%}
{%- elif msg.get('tool_calls') -%}
{%- set _ = msg_wo_thinking.update({'tool_calls': msg.get('tool_calls')}) -%}
{%- endif -%}
{%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg_wo_thinking] -%}
{%- else -%}
{%- set ns_drop.messages_wo_thinking = ns_drop.messages_wo_thinking + [msg] -%}
{%- endif -%}
{%- endfor -%}
{%- set full_messages = ns_drop.messages_wo_thinking -%}
{%- endif -%}
{#- prompt = bos_token if add_default_bos_token and len(context) == 0 else "" -#}
{%- if add_default_bos_token -%}
{{- bos_token -}}
{%- endif -%}
{#- for idx in range(len(messages)):
prompt += render_message(idx + len(context), full_messages, thinking_mode=thinking_mode) -#}
{%- for msg in full_messages -%}
{{- render_message(loop.index0, full_messages, thinking_mode) -}}
{%- endfor -%}