So far so good but the CoT rambling is way too much + question
Congrats for the release! I like the writing style, feels a lot less artificial than most other recent models. Model feels okay for its size, didn't get to really test it much yet, obviously. But so far I like it.
My only pet peeve is the CoT. Either I'm using the wrong sampling method (i tested a few) or you haven't tuned against forever / looping CoT. It, too often, gets into the thousand (even plural on occasion) of tokens for simple tasks. It's really wasteful in that regard. I thought Qwen was bad, but this is a whole level above it. Was it trained on some native way to disable CoT, like /nothink in qwen. Normally, i'd prefill queuing the start/end think tags with nothing in between, sadly this doesn't work most of the time with your model (blank generation). Or maybe is there a reasoning "effort" setting?
Another thing, looking at the jinja template you have an "environment" role setup. Was it used during training, and if so, what's its purpose? Is it a form of system message?
Edit: Prefilling responses with this seems to help "modulating" the reasoning effort. No idea how damaging it's to CoT. But worth sharing.
<think>\nOkay, I'll keep my thinking short.
We're working on this for future versions @SerialKicked ! I agree it is a big time yapper, especially for easy queries. Its more similar for math+coding+reasoning queries.
For what its worth, the longer the total context is, the worst it seems to get. It's not as bad in 1 shot interactions.
Still nice, it's quite an achievement. A fully open 32B CoT model wasn't on my bingo card this year. Good luck to you all.