Gemma 3 - 1B Elite Fusion (Experimental)

This model is the result of a specialized "Elite Neuron Fusion" technique applied to the Gemma architecture. It is not a standard model merge; rather, it uses a surgical approach to inject reasoning capabilities from earlier layers into deeper layers.

🔬 Methodology: Elite Neuron Fusion

Unlike traditional merging methods (SLERP, Linear) that blend entire weights, this method uses a density-based injection algorithm.

Layer Analysis: We identified specific resonance pairs between source (early-mid) and target (mid-deep) layers.
Top-k Filtering: For each pair, we calculated the delta vector (difference).
Density Selection: Only the top 20% of neurons with the highest activation/change were selected.
Injection: These "elite" neurons were injected into the target layers with a specific alpha scaling factor.

Technical Configuration

Source Layers: 16, 15, 14, 13, 12
Target Layers: 17, 18, 19, 20, 21
Density: 0.20 (Only 20% of weights are modified per layer)
Alpha: 0.40
Logic: Target = Target + (Delta * Mask * Alpha)

🎯 Goal

The primary goal of this experiment is to enhance the reasoning and logic capabilities of smaller language models (1B-2B range) without destroying their pre-trained knowledge base or causing severe hallucinations.

💻 Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "jetbabareal/gemma-3-1b-elite"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

input_text = "Question: ?\nAnswer:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Developer: jetbabareal Algorithm: Elite Neuron Fusion

Downloads last month: 3

Safetensors

Model size

1.0B params

Tensor type

F16

Model tree for jetbabareal/gemma-3-1b-elite

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Finetuned

(429)

this model

Finetunes

1 model