Personal report
Thanks for this model!
In my personal tests using thinking mode (e.g., riddles and complex reasoning problems requiring precise numerical responses), the fine-tuned version performs better than the original Qwen3-30B v2507 . This is the first fine-tuned model I've tested that achieves this performance level. (just think longer in my bench).
I have a question: What is the original model?
It does not function correctly in native mode with tool calls in Open Web UI (where the model mixes tool calls, thinking, and responses).
The default mode resolves this issue but I prefer the native workflow: think β tool_call β think β response. The default mode consolidates all tool calls into a single 'think' tag (faster but less accurate for complex requests).
I have completely rewritten the Jinja template multiple times to enable proper functionality in native tool mode. The optimal configuration I achieved is no_think with native tool calls, or think mode with native tool calls where the response remains exclusively within the think tag (preventing output outside of it).
Has anyone else encountered this issue? I'm using Open Web UI alongside LM Studio.
the original model is Qwen3-30B-A3B-Instruct-2507, and we've noticed this issue with tool calling in thinking mode.
thanks for the feedback on the model! I'm glad you liked it and I'm so sorry for this
@Elsephire It is a derivative developed based on my merged model: Qwen3-30B-A3B-YOYO-V3. I'm delighted to see it become even better, but it's really disappointing that my contributions weren't mentioned at all.
I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model
I'll make sure to credit David and YOYO though. I used Claude to write the readme and I forgot to give it this extra context. I'm so sorry for both of them to not being credited, and I'm thankful for the feedback and support coming from everyone π
I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model
6 7
pls kill me /j
I appreciate being credited.
Please note the following:
1 - All my models have a very specific finger print.
2 - All models are bench marked, and these have specific finger prints.
3 - All models with training (via Unsloth and/or addition of the Brainstorm adapter) have watermarks.
My colleague and I have tested your 42B model against some of my recently released models.
He has posted results on reddit - the results are very clear.
With that being said, I expect you do to the right thing.
I have no issue with people using my models, modifying them, merging, learning from them etc etc.
But there is a line.
David
sure