Ollama Modelfile to fix tool calling within GLM-4.7-Flash GGUF
Just thought i'd put this here.
Here's a template ollama modelfile that allows you to use this GGUF with Ollama with working tool calls (calling GLM-4.7 Flash from Cline, roo code, kilo code, etc)
FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M
RENDERER glm-4.7
PARSER glm-4.7
TEMPLATE """[gMASK]<sop><|system|>
{{ .System }}<|user|>
{{ .Prompt }}<|assistant|>
{{ .Response }}"""
PARAMETER stop <|user|>
PARAMETER temperature 0.7
PARAMETER top_p 1.0
PARAMETER min_p 0.01
PARAMETER repeat_penalty 1.0
Tutorial on how to create/use this modelfile with ollama
1.) Copy/Paste the code above to a .modelfile (example: "GLM-4.7-Flash.modelfile")
2.) Pull your desired unsloth quant into ollama (using Q4_K_M for this example)
ollama pull hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M
3.) cd to wherever you saved your .modelfile and run the following (For this example, i'm using the same filename above)
ollama create GLM-4.7-Flash-GGUF:Q4_K_M -f GLM-4.7-Flash.modelfile
4.) [OPTIONAL] Remove the original GGUF downloaded from HuggingFace
ollama rm hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M
Thanks for this, in case any Ollama users stumble across this discussion they can refer to this!
Here's another version of the .modelfile I was playing around with. I was trying to re-create the performance I'm getting with Qwen3-Coder 30B, but I really feel like GLM-4.7-Flash is just not a worthy successor at this level of quantizaiton. Although, you're more than welcome to play around with my settings and see what you can tweak
:)
FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M
# We define the template manually to force specific tool behavior compatible with Cline
TEMPLATE """[gMASK]<sop>{{- if .Messages }}
{{- if or .System .Tools }}<|system|>
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}
# Tool Instructions
You are a helpful assistant with access to tools. You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|user|>
{{ .Content }}
{{- else if eq .Role "assistant" }}<|assistant|>
{{- if .Content }}{{ .Content }}{{- end }}
{{- if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}
{{- else if eq .Role "tool" }}<|observation|>
<tool_response>
{{ .Content }}
</tool_response>
{{- end }}
{{- end }}
{{- else }}
{{- if .System }}<|system|>
{{ .System }}
{{- end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}
{{- end }}<|assistant|>
{{- end }}{{ .Response }}"""
# Parameters tuned to be stricter (closer to Qwen settings)
PARAMETER temperature 0.6
PARAMETER top_p 0.9
PARAMETER min_p 0.05
PARAMETER top_k 40
PARAMETER repeat_penalty 1.05
# GLM specific stop tokens
PARAMETER stop "<|user|>"
PARAMETER stop "<|observation|>"
PARAMETER stop "<|system|>"