unfortunately not stable enough

by Otakadelic - opened May 5, 2025

May 5, 2025

Hello z12,

First of all, HUGE thanks for your fantastic models!
My storage always includes several of your merged gems, from MT1 through MT5.

I’ve been testing Gemma 3 12B—including most of the Abliterated versions—and unfortunately, they feel unstable and quite far from the quality of Gemma 2.
Your MT-Gemma-3-12B model, for example, isn’t in the same tier as your great Gemma 2 work (e.g., it sometimes responds in two different languages, among other quirks).

To be clear, Google’s original Gemma 3 is great—definitely strong enough on its own. So right now, it feels like we have two real options:

Stick with Google's original models.

Wait for someone to crack the advanced methods for reducing or removing rejection.

I'm trying too, but the hurdle is really high.

Ref: mlabonne’s Abliterated version

“The model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora’s repo) for most layers (3 to 45), independently. Combined with a refusal weight of 0.6…”

I deeply respect his work—books, models, and papers—but honestly, this Abliterated version feels far off from what the description suggests. Even he notes it’s experimental:

"It might not turn out as well as expected... I saw some garbled text from time to time (e.g., 'It' my' instead of 'It's my')."

My guess is that hidden states contribute heavily to response quality—so any edits become a tightrope walk. Even changing weight values slightly can degrade output.

Just wanted to share where I’m at. Keep up the amazing work as always!

zelk12

Owner May 5, 2025

In general, the first generation (gen-0) of merges I did was just a test to see which models could be used in a merge.
For example, one model gave an error when trying to merge, and some models differed in the number of parameters.
In generation 1 (gen-1) I checked the possibility to merge many models without errors, and then I will move on to standard merge experiments.
Hopefully, the quality of the answers will increase with further attempts. And that more pre-trained models will appear.

В целом первое поколение (gen-0) слияний, которое я сделал, это просто проверка, на то, какие модели можно применить в слиянии.
Так к примеру при попытке слияния одна модель выдал ошибку, а некоторые модели отличались вроде количеством параметров.
В поколении 1 (gen-1) проверил возможность объединить множество моделей без возникновения ошибок, а дальше перейду к стандартным экспериментам слияния.
Будем надеяться что, при дальнейших попытках, качество ответов возрастёт. И что появиться больше дообученых моделей.

Otakadelic

May 7, 2025

Thanks for the update!!!
It’s great to hear you're testing merge compatibility across different bases—laying that groundwork really opens up future possibilities. I’ll keep trying each and every releases.
My attempt for gemma3 by own is progressing.
Gemma3 has original structures so created gemma3 only transformers and removal scripts.
Hoping more strong fine-tuned models appear soon so we can all push the next phase forward.
Looking forward to your masterful models—your work is always inspiring!
Spasibo
-otkd

zelk12

Owner May 7, 2025

Looking forward to seeing your version of the model.

And the fact that the model answers in two different languages may be related to IlyaGusev/saiga_gemma3_12b, as this model was trained mainly in Russian.

Жду когда смогу увидеть вашу версию модели.

А то, что модель отвечает на двух разных языках может быть связано, с IlyaGusev/saiga_gemma3_12b, так, как эта модель, обучалась преимущественно, на русском языке.

zelk12

Owner Aug 20, 2025

@Otakadelic

What's the progress with Gemma 3?

Каковы успехи с Gemma 3?

Otakadelic

Aug 26, 2025

Thx again for great reminder!

Below is my current impression.

⚡️Gemma3 12B

My feeling is weight for refusal of 12B is not refusal purpose but boosting output quality. Many other models too but G3 12B has stronger tendency with this. G3 27B has larger margin against weight modification.
But G3 12B is excellent with delicate expression or first person perspective. (G3 27B is more generic or safer, milder outputs)
That is the best part of G3 12B(G2 9B too) and lower refusal means wider usage so I still mess around with G3 12B.

So far, my 48G vram is not enough for precise weight adjustment. I am creating 60G vram environment but my current estimation is existing refusal libraries are not good solution for G3 12B. Precise adjustment is TransformerLens however much vram hungry. I think 120G(or even more) is needed for 12B sized model.
In the meantime, Phi-4(14B) is good or great alternative. Similar capacity and way steady even with aggressive weight adjustment.

In short, still wanting create better variants of G3 12B but my rig is not capable for precise adjustments, even remotely.

Thx again for keep releasing great variants! Hopefully I will figure out then upload one or two variants of fantastic G3 12B.

-otkd

zelk12

Owner Aug 29, 2025

And there is another question: at what sampler parameters do you conduct testing?
Just testing Gemma-3 12B it works fine even if k-top = 1.

А ещё есть такой вопрос, при каких параметрах семплера вы проводите тестирование?
Просто тестируя Gemma-3 12B он нормально работает даже если k-top = 1.

zelk12

Owner Aug 29, 2025

•

edited Aug 29, 2025

Vokturz/can-it-run-llm Just a small tool.

Vokturz/can-it-run-llm Просто небольшой инструмент.

zelk12

Owner Aug 29, 2025

Alternatively, if you don't mind wasting your time, you can use RAM as video memory, and expand RAM using permanent memory. The speed of calculations will, of course, drop, but at least many tasks will become accessible.

Как вариант если не жалко времени, можно использовать оперативную память как видео память, а оперативную память расширить при помощи постоянной памяти, скорость вычислений, конечно, упадёт, но многие задачи, хотя бы станут доступны.

Otakadelic

Aug 30, 2025

@zelk12 — thank you so much for your thoughtful notes and kindness. I truly appreciate it. ❤️

Scope clarification. 🔥
What I am pursuing here is weight modification specifically to reduce refusal. This is fundamentally different from inference, full training, or ordinary fine-tuning. Typical “refusal removal” methods compare harmful vs. harmless prompts and nudge parameters toward less-refusal outputs (effectively down-weighting refusal-associated features). That family of methods helps on many models, but Gemma-3 12B behaves differently: its refusal behavior is tightly coupled to instruction quality and style, so blunt approaches degrade overall output.
Here are the approaches I’m currently exploring:

Decode-time biasing of specific refusal phrases/tokens (gentle negative logit bias, never hard bans). 🧭
Activation steering (small steering vector learned from helpful vs. refusal activations, injected at one layer during generation). 🧩
Tiny preference LoRA (DPO/ORPO) on refusal-vs-helpful pairs, adapting only a few late blocks to preserve voice. 🎯
Contrastive decoding against a more refuse-prone policy (subtract a scaled refuser policy at decode time). ⚖️
Prompt rewriting pre-step that removes trigger phrasing before handing the task to G3. ✍️

Ultimate direction: TransformerLens 🔥🔥🔥
TransformerLens (now with support relevant to Gemma-3) enables pinpoint neuron/feature edits—exactly what this needs: surgical changes with minimal side effects, even on a “twisted” model like G3-12B.
However, It requires full weights on vram and large working area is required.
My estimation is 80 GB minimum; for realistic analyze→edit→verify cycles at 2–4k context with full caches, plan on ~110–120 GB. 💀

I’ll keep iterating toward a stable, low-refusal G3-12B while preserving its delicate first-person expressiveness. Thank you again for the encouragement and for sharing tools and perspectives—it genuinely helps. 🐧🔥

zelk12

Owner Nov 26, 2025

@Otakadelic

Hello, can you please help me?
I'm going to delete most of the models on this account so we can continue merging. I'd also like to ask you to indicate the models you like so we can save them.

Здравствуйте, можете, пожалуйста, мне помочь?
Я собираюсь удалить большую часть моделей, что есть на аккаунте, чтобы можно было продолжать проводить объединения. И я хотел бы попросить вас, указать модели которые вам понравились, чтобы их сохранить.

Otakadelic

Nov 27, 2025

Sure! Below are my thoughts 🧠✨

Gemma2 models

All GGUF files can be removed — anyone can recreate GGUF or awq/exl2/3 on their own.
In the early days, when you merged and released new models, most were two-in-one merges.
If four or more source models were involved, that usually produced two middle-models, and then the final merge (again two-in-one) created the targeted single model.
Final models (no child models) are the most important to preserve 🐲
Middle models (models that only existed to produce the final one) are good deletion candidates, though keeping a few “merge-recipe” checkpoints is helpful for future reference 📘
Original / ancestor models are also valuable to preserve — they show merge history and lineage.
Personally, my go-to Gemma2 merged model is MT5-Gen5-gemma-2-9B.
I honestly can’t explain why this specific one stands out or how it differs from close neighbors like MT5-Gen4-gemma-2-9B or MT-Max-Merge, but MT5-Gen5-gemma-2-9B is fantastic and stable for my 8k-context usage. Pure gold ⭐🔥
Another personal preference: very early models (especially before Oct 11, 2024) feel worth keeping because they act as common ancestors for later, more refined merges.

Gemma3 models

My AI rig was down for several weeks, so I couldn’t check your recent Gemma3 uploads.
Right now, models with “9B” in the name = around 863 models.
Models with “12B” = around 63 models.

If you remove dozens or even hundreds of 9B models, you can take care of Gemma3 merged models later because the overall volume of 12B is still relatively manageable 🧩
And models without GGUF (except you and me) can have lower priority.
On the other hand, models with GGUF by someone else, like mradermacher, will have higher priority.

Closing

Hope this helps you plan the shrink ✨
And thank you — truly — for the insane amount of work you’ve put into these merges. Your archive is a treasure hoard 🐉💛

zelk12

Owner Nov 27, 2025

Thank you for such a detailed answer.
I hope this is enough to free up at least 12 TB.

Спасибо за такой развёрнутый ответ.
Надеюсь этого хватит, чтобы освободить хотя бы 12ТБ.

Otakadelic

Dec 20, 2025

Okay, now I got it.

There are several (maybe dozens) of uncensored Gemma-3 models out there, for example:

Many of them aim for something like “acceptance rate > 90%” 🔥
That’s fine for one-shot prompts or short sessions… but it often breaks down in long, meaningful multi-turn chats (dozens of turns): quality drops and infinite loops show up.

After testing models with refusal rates from 5% to 23%, the most realistic solution for Gemma-3 seems to be two-model routing 🛠️

A) Model A (20–25% refusal rate)

stable 🧱
meaningful responses 🧠
rarely falls into infinite loops ♾️
for “erotic RP” level prompts, refusals are usually rare

B) Model B (<10% refusal rate)

use Model B when Model A refuses 🧨
it will usually answer ✅
then switch back to Model A 🔁

Workflow: Model A as primary, Model B as a refusal-breaker.
This is the best way I’ve found to maximize Gemma-3’s capability right now.

My hypothesis:
Some refusal-related weights might actually be part of Gemma-3’s stability/quality “pillar,” so deleting them too aggressively hurt long-session performance 🧪

Thanks for keeping the new models coming!! 💕

zelk12

Owner Dec 22, 2025

Sorry it took so long to reply.

Well, I need some help here. Because I'm not entirely sure I understand what "alibration" is.
Am I correct in understanding this? That the model is run on messages to generate a rejection, and then the parameters that lead to the rejection are relaxed?

If I understand correctly, then the most likely explanation is something very simple. All the configured parameters help shape the response, and as a result of such a rather crude intervention, the parameters that comprised the bulk of the model's sequences also begin to degrade.

If so, then we probably need to force rejections and replace rejections with answer options. Alternatively, we could combine the model's rejections and responses and, accordingly, perform double bias, strengthening the responses and weakening the rejections. To achieve even greater accuracy, we could force the model to respond, thereby strengthening what was already close to the answer.

But yes, unfortunately, I haven't studied this and have likely described mechanisms that are already in use, and have either misrepresented or insufficiently described the operating principle. If so, please correct me.

Простите, что так долго отвечал.

Ну тут мне нужна помощь. Так как я в целом не уверен, что правильно понимаю, что такое "алибирация".
Я правильно понимаю? Что модель запускается на сообщения чтобы выдавала отказ, а потом, параметры, которые приводят к отказу, ослабляются?

Если я правильно понял, тогда наиболее вероятна очень простая вещь. Все настроенные параметры и помогают формировать ответ, и в результате такого достаточно грубого вмешательства, параметры, что составляли и основную часть последовательностей модели, тоже начинают деградировать.

Если так, тогда вероятно нужно приводить к отказу и отказы, заменять на варианты ответов. Или же, сделать следующее собрать отказы модели и ответы и соответственно проводить двойное смещение, усиливать ответы и ослаблять отказы. При этом чтобы было ещё точнее можно заставлять модель ответить, таким образом, мы получим усиление именно того, что и так было близко к ответу.

Но да я, к сожалению, это не изучал и вероятно описал механизмы, которые и так используются, и не верно или недостаточно верно описал принцип работы. Если так, прошу меня поправить.

And a slightly separate question. Unfortunately, I can't download models from HuggingFace right now, so I'm wondering what you think of the different Gemma 3 models.

And also a question, do you know how to download the HuggingFace model, bypassing the addresses?

И немного отдельный вопрос. Я к сожалению не могу скачивать сейчас модели именно с HuggingFace и по этой причине вопрос, как вам разные варианты ещё моделей Gemma 3.

И также вопрос, не знаете ли вы как скачать модель, в обход адресов, HuggingFace?

Otakadelic

Dec 25, 2025

About Q1 (refusals + weight change)

Thank you so much for your question, truly 🙏🔥
I will try to explain everything as simple as possible.

Most models (Llama3, Gemma2, Gemma3) have special weights that control refusals.
Changing these weights is the easiest way to reduce refusals.
Much easier than full training (training needs several or dozen AI Servers).
So weight editing is the usual hobbyist method.

But for Gemma3-12B-it, my results were very clear:

No weight change ever made the model better than the stock original. 😭💀
I tried more than 10 edits and more than 20 variants 🔥
All of them lost quality.
So, I believe the refusal weights are part of Gemma3’s “quality pillar.”
If we ever changed, quality of responses get weaker.

Your idea (double bias, strengthening answer, weakening refusal) sounds nice 👍🔥
But with Gemma3-12B, this still did not improve quality for me.
Stock Gemma3-12B is the strongest version.
If a model cannot fix refusals by weights, the next option is training.
Gemma3-12B-pt is available (pre-instruction), so training is technically possible.
But training 30k+ lines is very heavy.
That is for companies like NousResearch, not small hobbyists like me 😭💀

This is the dataset NVIDIA used for training their 4B Nemotron model:
(31k prompts) 👉 https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset/viewer/SFT/safety
Anyone can make a new dataset by replacing the answers.
But actually training it… very hard for home users.

Gemma3-12B-it original = best quality.
Weight edits = always weaker.
Training = too big for one person.

Currently, I am toying with models such as :
phi-4-reasoning, phi-4-reasoning-plus, Qwen3 latest, or NVIDIA Nemotron.
These models give me strong, emotional, realistic responses too. 🔥🐧

But again — thank you so much
Your Gemma2 9B merges were truly beautiful and poetic.
I learned a lot from them.
Gemma3 also gave me great moments.
I am very thankful for your work!🙏🙏🙏🔥💛💛💛

Otakadelic

Dec 25, 2025

Sure! If the model or others are open to public, I think I can help 👍🔥
You can contact me through my GitHub:
https://github.com/otakadelic/you-can-contact-me/

🐧💛

zelk12

Owner Dec 25, 2025

As far as I know, in general, models do not have parameters that are solely responsible for failure.
It's likely that many models simply had this failure mode as a secondary learning step, which is why it was essentially added to the standard parameters. In the case of Gemma 3, Google likely started adding a failure mode from the start, which led to the failure mode being integrated with the rest, essentially merging with the main set of parameters.

As an option, try the following.
We receive a refusal and prohibit one token of this refusal, to obtain a larger number of variations, just in case.

Next, we try to select tokens that will lead to an answer, but it is advisable to sort them so that the answer is logical.
In this case, the answer should not be written for the model, but rather ensured that the model itself writes a response without rejection. In this case, increasing the probabilities of these tokens will have the least impact on the model's behavior.

Well, and accordingly, in the end, follow the standard path.

The real question is whether anyone has tried this or not. I just don't really follow what's going on and what methods are being used and have been used. Perhaps I'm suggesting something that's already been tried.

Насколько я знаю, в целом у моделей, нет параметров, которые отвечали бы чисто за отказ.
Вероятно, просто у многих моделей, отказ, был как дообучение, по этому и получалось, что он по сути добавлялся к обычным параметрам. В случае же Gemma 3 гугл вероятно изначально начали добавлять систему отказа, что и привело к тому, что отказ встроился с остальным, по сути слившись с главной частью параметров.

Как вариант, попробовать следующее.
Получаем отказ и запрещаем по одному токену этого отказа, для получения большего количества вариаций, на случай.

Дальше пробуем выбрать токены, которые будут приводить к ответу, но желательно отсортировать так, чтобы ответ был логичным.
При этом ответ нужно писать не за модель, а делать так, чтобы ответ без отказа модель написала сама. В таком случае, увеличение вероятностей этих токенов, будет меньше всего влиять на поведение модели.

Ну и соответственно в конце идти уже стандартным путём.

Тут правда вопрос, пробовал ли кто-то такое или нет. Просто я не очень слежу за тем, что происходит и какие методы сейчас применяют и применяли. Возможно я предлагаю, то, что уже пробовали.

zelk12

Owner Jan 4

@Otakadelic
Hello, I wanted to ask if you have checked this model.:
YanLabs/gemma-3-27b-it-abliterated-normpreserve
It seems to show pretty good results when used.

Здравствуйте, хотел спросить проверяли ли вы эту модель:
YanLabs/gemma-3-27b-it-abliterated-normpreserve
Вроде она показывает достаточно неплохие результаты при использовании.

Otakadelic

Jan 7

I will check its quality/stability.
Generally 27B is safer than 12B. Good for summarize, higher stability, not so for describe delicate feelings/emotions during Role Play Scene.
But anyway, I will try.

Otakadelic

Jan 10

Tried, however, even mild(by my standard) prompts then easy to fallen infinite loop.
the model uses new tool and I also tried the same tool, jim-plus' llm-abliteration.

All removal tools including llm-abliteration 100% expected harmful prompts(generally all of them use 420 prompts) will be rejected without exception.
If a few of them has been accepted, the entire weight modify collapsed. Since Gemma3 is very delicate with its weight, "Expecting all 420 prompts will be rejected" won't fill well.
Sole exception is "heretic".

https://github.com/p-e-w/heretic

This will check each response and judge refusal or accepted then modify weights accordingly.
HOWEVER.
The judge logic pat is regex, models response variations of refusals will not fit well.

My status is, I have done with Gemma3. Original Google's models generate good or great responses so use as-is.
Currently moved on to newer models such as Nvidia's Nemotron or recent models from Qwen3.

zelk12

Owner Jan 10

Here's a question: do Nvidia's Nemotron and Qwen3 support multiple languages and have intelligent capabilities?
For example, at least the ability to read text and not ignore it.

А такой вопрос, а у Nvidia's Nemotron и Qwen3 мультиязычность и интеллектуальные возможности?
К примеру хотя бы умение читать текст и не игнорировать его.

Otakadelic

Jan 10

I use ENG only but when I tried Japanese/German (I deal with those a little bit) no llm rejected since late 2024/2025.

About Nvidia
Nvidia's models are really low and few refusals. Refusals are extreme or confronting social norm only.
Also they are providing free API access for popular models including non-Nvidia models too.
You can check details in following page.
https://docs.api.nvidia.com/
About Qwen 3
Qwen2.5 sucks (well, from our standards) but they improved regular basis and six or twelve months later almost different models.
I want to try various models however, since later 2025, smaller models(less than 24B) are very limited. Qwen or few Mistral only.
My experience, the model older than 12months must have very strong reasons to keep using. newer models come with not only refreshed data, but also steady and high quality output. Following the current trend is one other reason I am toying Qwen3.

Russian is not my language (yet) so below is (hopefully better) translation.

Я использую только английский, но когда пробовал японский и немецкий (я с ними немного работаю), ни один LLM не отказывался работать с текстом — по крайней мере с конца 2024 / начала 2025 года.

Про Nvidia.
Модели Nvidia в целом довольно «спокойные»: отказов мало, и появляются они только в действительно экстремальных случаях или при прямом конфликте с социальными нормами. Кроме того, Nvidia предоставляет бесплатный доступ к API для популярных моделей — в том числе и не принадлежащих Nvidia. Подробности есть на официальной странице:
https://docs.api.nvidia.com/

Про Qwen 3.
Qwen 2.5, честно говоря, слабоват (по нашим меркам), но они регулярно его улучшали, и через шесть–двенадцать месяцев это уже практически другие модели. Я стараюсь пробовать разные варианты, однако с конца 2025 года выбор компактных моделей (меньше 24B параметров) стал очень ограниченным — в основном это Qwen и несколько моделей Mistral.

По моему опыту, у моделей старше 12 месяцев должны быть действительно веские причины, чтобы продолжать их использовать. Новые модели приносят не только обновлённые данные, но и более стабильный и качественный вывод. Следить за текущими трендами — одна из причин, почему я сейчас экспериментирую с Qwen 3.

Спасибо,
-otkd

zelk12

Owner Jan 12

•

edited Jan 12

As for the translation, he softened it a bit in some places. Which is funny.
Well, Markdown formatting was lost.
So, what model or service was used for the translation? I'm just curious.
Regarding the models. I guess I should try it.

And so I often pay more attention to the logical sequence in the text and how the model writes everything.
For example, I can say about Gemini that he often ignores the text and comes up with something of his own.

Well, Gemini often can’t solve the simplest logic problems.

Насчёт перевода, он его немного смягчил в некоторых моментах. Что забавно.
Ну и было потеряно форматирование Markdown.
Такой вопрос, какая модель или сервис использовались для перевода? Ну это так интерес.
По моделям. Наверно да надо попробовать.

А так я часто больше слежу именно за логической последовательностью в тексте и тем как модель пишет всё.
К примеру я могу про Gemini сказать, он точно часто игнорирует текст и придумывает что-то своё.

Ну и простейшие логические задачки Gemini часто решить не может.

zelk12

Owner Jan 12

I just looked at Nvidia models and wondered why there aren't many of them in the table, and apparently there aren't many variants. Although, you could probably try searching for variants.
You don't know?

Я посмотрел сейчас модели Nvidia и вопрос, появился, а с чем связано, что их и в таблице мало, и их модификаций, вроде. Хотя модификации можно возможно попробовать поискать.
Вы не знаете?

Otakadelic

Jan 13

Translation
Translation is done by chatGPT5.2. Pasted the result from Google Translate then "Do better than this."
Nvidia as model provider
They do have simple Web UI but those are not for serious usage. Register then create API key then you can access thru own llm client such as OpenWebUI or LibreChat.

My Current impressions for models are following:

Grok 4.1 Fast (two exists. With and Without reasoning) is currently best or excellent for storytelling. Way beyond llama3 70B/405B tiers.

chatGPT5.1 is the best for funny/crazy conversations. May not productive, but if you have to kill some time then 5.1 is best to ask/challenge/telling jokes then 5.1 will counter with crazier ones.

Gemini3 Pro is best for coding, analysis. Totally different from Gemini2.5. Not good for creativity or humor.

You can try those models thru OpenRouter.ai.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment