zai-org
/

GLM-4.7

@@ -26,9 +26,9 @@ pipeline_tag: text-generation
 **GLM-4.7**, your new coding partner, is coming with the following features:
-- **Core Coding**: GLM-4.7 brings clear gains, compared to its predecessor GLM-4.6, in multilingual agentic coding and terminal-based tasks, including (73.8%, +5.8%) on SWE-bench, (66.7%, +12.9%) on SWE-bench Multilingual, and (41%, +10.0%) on Terminal Bench. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
 - **Vibe Coding**: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
-- **Tool Using**: GLM-4.7 achieves significantly improvements in Tool using. Significant better performances can be seen on benchmarks such as τ^2-Bench and on web browsing via BrowserComp.
 - **Complex Reasoning**: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities, achieving (42.8%, +12.4%) on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6.
 You can also see significant improvements in many other scenarios such as chat, creative writing, and role-play scenario.
@@ -57,14 +57,14 @@ You can also see significant improvements in many other scenarios such as chat,
 | BrowseComp-Zh                  |  66.6   |  49.5   |       62.3       |     65.0      |       -        |       42.4        |    63.0    |      -       |
 | τ²-Bench                       |  87.4   |  75.2   |       74.3       |     85.3      |      90.7      |       87.2        |    82.4    |     82.7     |
-> AGI is a long journey, and benchmarks are only one way to evaluate performances. While the metrics provide necessary checkpoints, the most important thing is still how it *feels*. True intelligence isn't just about acing a test or processing data faster; ultimately, the success of AGI will be measured by how seamlessly it integrates into our lives.
 ## Getting started with GLM-4.7
 ### Interleaved Thinking & Preserved Thinking
-![thinking](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/thinking.png)
 GLM-4.7 further enhances **Interleaved Thinking** (a feature introduced since GLM-4.5) and introduces **Preserved Thinking** and **Turn-level Thinking**. By thinking between actions and staying consistent across turns, it makes complex tasks more stable and more controllable:
 - **Interleaved Thinking**: The model thinks before every response and tool calling, improving instruction following and the quality of generation.
@@ -214,5 +214,4 @@ If you find our work useful in your research, please consider citing the followi
       archivePrefix={arXiv},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2508.06471},
-}
-```

 **GLM-4.7**, your new coding partner, is coming with the following features:
+- **Core Coding**: GLM-4.7 brings clear gains, compared to its predecessor GLM-4.6, in multilingual agentic coding and terminal-based tasks, including (73.8%, +5.8%) on SWE-bench, (66.7%, +12.9%) on SWE-bench Multilingual, and (41%, +16.5%) on Terminal Bench 2.0. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
 - **Vibe Coding**: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
+- **Tool Using**: GLM-4.7 achieves significantly improvements in Tool using. Significant better performances can be seen on benchmarks such as τ^2-Bench and on web browsing via BrowseComp.
 - **Complex Reasoning**: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities, achieving (42.8%, +12.4%) on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6.
 You can also see significant improvements in many other scenarios such as chat, creative writing, and role-play scenario.
 | BrowseComp-Zh                  |  66.6   |  49.5   |       62.3       |     65.0      |       -        |       42.4        |    63.0    |      -       |
 | τ²-Bench                       |  87.4   |  75.2   |       74.3       |     85.3      |      90.7      |       87.2        |    82.4    |     82.7     |
+> **Coding:** AGI is a long journey, and benchmarks are only one way to evaluate performance. While the metrics provide necessary checkpoints, the most important thing is still how it *feels*. True intelligence isn't just about acing a test or processing data faster; ultimately, the success of AGI will be measured by how seamlessly it integrates into our lives-**"coding"** this time.
 ## Getting started with GLM-4.7
 ### Interleaved Thinking & Preserved Thinking
+![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/thinking.png)
 GLM-4.7 further enhances **Interleaved Thinking** (a feature introduced since GLM-4.5) and introduces **Preserved Thinking** and **Turn-level Thinking**. By thinking between actions and staying consistent across turns, it makes complex tasks more stable and more controllable:
 - **Interleaved Thinking**: The model thinks before every response and tool calling, improving instruction following and the quality of generation.
       archivePrefix={arXiv},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2508.06471},
+}