Post
918
Instead of architectural upgade, each major model drop nowadays perfects a regional innovation. What Kimi brought to spot light this time is quantization aware training (QAT). I wrote an article to explain it and why it matters to reasoning models.
https://huggingface.co/blog/onekq/qat-bonsai
If you are interested in this kind of posts, I will introduce the Muon optimizers, another technology behind Kimi success.
https://huggingface.co/blog/onekq/qat-bonsai
If you are interested in this kind of posts, I will introduce the Muon optimizers, another technology behind Kimi success.