Personal report

by Elsephire - opened 4 days ago

4 days ago

Thanks for this model!

In my personal tests using thinking mode (e.g., riddles and complex reasoning problems requiring precise numerical responses), the fine-tuned version performs better than the original Qwen3-30B v2507 . This is the first fine-tuned model I've tested that achieves this performance level. (just think longer in my bench).

I have a question: What is the original model?

It does not function correctly in native mode with tool calls in Open Web UI (where the model mixes tool calls, thinking, and responses).
The default mode resolves this issue but I prefer the native workflow: think → tool_call → think → response. The default mode consolidates all tool calls into a single 'think' tag (faster but less accurate for complex requests).

I have completely rewritten the Jinja template multiple times to enable proper functionality in native tool mode. The optimal configuration I achieved is no_think with native tool calls, or think mode with native tool calls where the response remains exclusively within the think tag (preventing output outside of it).

Has anyone else encountered this issue? I'm using Open Web UI alongside LM Studio.

Elsephire changed discussion title from Personnel repport to Personal report 4 days ago

aquiffoo

aquif AI org 3 days ago

the original model is Qwen3-30B-A3B-Instruct-2507, and we've noticed this issue with tool calling in thinking mode.

thanks for the feedback on the model! I'm glad you liked it and I'm so sorry for this

YOYO-AI

3 days ago

@Elsephire It is a derivative developed based on my merged model: Qwen3-30B-A3B-YOYO-V3. I'm delighted to see it become even better, but it's really disappointing that my contributions weren't mentioned at all.

Elsephire

3 days ago

Hmm, I had a strong doubt; this behavior reminds me of your model. I assume that the 42B is derived from a fine-tune of one of @DavidAU @'s models.

@aquiffoo , to avoid any misunderstandings in a world of open-source models, it would be preferable to cite the contributors. Its Win/Win 😀

aquiffoo

aquif AI org 2 days ago

I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model

I'll make sure to credit David and YOYO though. I used Claude to write the readme and I forgot to give it this extra context. I'm so sorry for both of them to not being credited, and I'm thankful for the feedback and support coming from everyone 🙏

FlameF0X

2 days ago

•

edited 2 days ago

I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model

6 7

pls kill me /j

aquiffoo changed discussion status to closed 2 days ago

DavidAU

2 days ago

@aquiffoo

I appreciate being credited.

Please note the following:

1 - All my models have a very specific finger print.
2 - All models are bench marked, and these have specific finger prints.
3 - All models with training (via Unsloth and/or addition of the Brainstorm adapter) have watermarks.

My colleague and I have tested your 42B model against some of my recently released models.
He has posted results on reddit - the results are very clear.

With that being said, I expect you do to the right thing.
I have no issue with people using my models, modifying them, merging, learning from them etc etc.
But there is a line.

David

aquiffoo

aquif AI org 2 days ago

sure

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment