Ollama Modelfile to fix tool calling within GLM-4.7-Flash GGUF

#23

pinned

by CesarR70 - opened 25 days ago

Just thought i'd put this here.
Here's a template ollama modelfile that allows you to use this GGUF with Ollama with working tool calls (calling GLM-4.7 Flash from Cline, roo code, kilo code, etc)

FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

RENDERER glm-4.7
PARSER glm-4.7

TEMPLATE """[gMASK]<sop><|system|>
{{ .System }}<|user|>
{{ .Prompt }}<|assistant|>
{{ .Response }}"""

PARAMETER stop <|user|>

PARAMETER temperature 0.7
PARAMETER top_p 1.0
PARAMETER min_p 0.01
PARAMETER repeat_penalty 1.0

CesarR70

25 days ago

•

edited 24 days ago

Tutorial on how to create/use this modelfile with ollama

1.) Copy/Paste the code above to a .modelfile (example: "GLM-4.7-Flash.modelfile")

2.) Pull your desired unsloth quant into ollama (using Q4_K_M for this example)

ollama pull hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

3.) cd to wherever you saved your .modelfile and run the following (For this example, i'm using the same filename above)

ollama create GLM-4.7-Flash-GGUF:Q4_K_M -f GLM-4.7-Flash.modelfile

4.) [OPTIONAL] Remove the original GGUF downloaded from HuggingFace

ollama rm hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

shimmyshimmer pinned discussion 25 days ago

shimmyshimmer

Unsloth AI org 25 days ago

Thanks for this, in case any Ollama users stumble across this discussion they can refer to this!

CesarR70

21 days ago

Here's another version of the .modelfile I was playing around with. I was trying to re-create the performance I'm getting with Qwen3-Coder 30B, but I really feel like GLM-4.7-Flash is just not a worthy successor at this level of quantizaiton. Although, you're more than welcome to play around with my settings and see what you can tweak
:)

FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_M

# We define the template manually to force specific tool behavior compatible with Cline
TEMPLATE """[gMASK]<sop>{{- if .Messages }}
{{- if or .System .Tools }}<|system|>
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tool Instructions

You are a helpful assistant with access to tools. You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|user|>
{{ .Content }}
{{- else if eq .Role "assistant" }}<|assistant|>
{{- if .Content }}{{ .Content }}{{- end }}
{{- if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}
{{- else if eq .Role "tool" }}<|observation|>
<tool_response>
{{ .Content }}
</tool_response>
{{- end }}
{{- end }}
{{- else }}
{{- if .System }}<|system|>
{{ .System }}
{{- end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}
{{- end }}<|assistant|>
{{- end }}{{ .Response }}"""

# Parameters tuned to be stricter (closer to Qwen settings)
PARAMETER temperature 0.6
PARAMETER top_p 0.9
PARAMETER min_p 0.05
PARAMETER top_k 40
PARAMETER repeat_penalty 1.05

# GLM specific stop tokens
PARAMETER stop "<|user|>"
PARAMETER stop "<|observation|>"
PARAMETER stop "<|system|>"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment