It’s strange that Kimi K2.5 doesn’t show up in the China benchmark, yet here it manages to beat Gemini 3 Flash Thinking, which is actually pretty impressive.
Ironically, despite average benchmarks, GPT OSS 20B and GPT OSS 120B make most other open-source models look cheap, mediocre, and less compelling compared to premium closed models.
