π· Nexura-Gemma-2B
A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model
Nexura-Gemma-2B is a custom fine-tuned variant of Googleβs Gemma-2B model.
It is trained in two stages:
- SFT (Supervised Fine-Tuning) using high-quality instruction datasets
- DPO (Direct Preference Optimization) for preference alignment
The model follows a strict XML-style instruction format, exactly matching the SFT training data:
<user>
{instruction}
</user>
<assistant>
{response}
π 1. Base Model
- Base:
google/gemma-2b - Architecture: Decoder-only transformer LLM
- Tokenizer: Gemma tokenizer (sentencepiece)
- Training Type: QLoRA (SFT) + DPO
- Language: English
- Usage: General-purpose text generation & instruction following
π 2. Datasets Used
π¦ A. SFT Dataset (Supervised Fine-Tuning)
Merged into:
train_sft_50k.jsonl
Includes:
tatsu-lab/alpaca(~52k)databricks/dolly-15k- Additional filtered samples:
- lamini_20k
- ign_20k
- ultrachat_20k
(mostly skipped due to filtering)
SFT Prompt Format
<user>
{instruction}
</user>
<assistant>
{response}
π© B. DPO Dataset (Preference Alignment)
Merged from:
- Anthropic HH-RLHF
- Stanford SHP
- UltraFeedback
- JudgeLM
Used in chosen-vs-rejected pair format.
π 3. Training Details
π¦ SFT (Supervised Fine-Tuning)
QLoRA Configuration:
- Rank: 8
- Alpha: 16
- Dropout: 0.05
- Precision: bfloat16
- Epochs: 1
- LR: 2e-4
- Gradient Accumulation: 20
- Target Modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
π© DPO (Direct Preference Optimization)
- Beta: 0.1
- Learning rate: 5e-5
- Grad Accumulation: 8
- Policy model = SFT-trained adapter
π 4. Inference Instructions
Below is the exact format required to prompt the model, matching the training:
Prompt Template
<user>
{your_message}
</user>
<assistant>
π¦ FastAPI Streaming Server (server.py)
This model was tested using a custom FastAPI server with:
- Local model loading (no HF auto-download)
- SFT-exact prompt builder
- Tag suppression to prevent invalid XML-like output
- Greedy decoding:
do_sample=Falserepetition_penalty=1.3no_repeat_ngram_size=4
Example: Python Local Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_dir = "Nexura-gemma2b-sft-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")
prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
repetition_penalty=1.3,
no_repeat_ngram_size=4
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
π© Curl API Example
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
π 5. Intended Use
β Recommended Uses
- Chat assistants
- Instruction following
- Educational Q/A
- Coding help
- Summaries
- Reasoning tasks
- Content rewriting
β Not Recommended
- Medical, legal, or financial advice
- Real-world decision making
- High-risk or safety-critical systems
- Generating harmful, biased, or toxic content
π 6. Strengths
- Lightweight (2B parameters)
- Fast inference on consumer GPUs
- Clean behavior after SFT formatting correction
- Strong alignment after DPO training
- Stable responses due to greedy decoding
π 7. Limitations
- Limited knowledge compared to larger LLMs
- May hallucinate if prompt format is not followed
- Not multilingual
- No factual updates after 2023 (Gemma limitation)
π 8. Hardware Requirements
- GPU Recommended: 8GB+ VRAM
- Minimum CPU RAM: 6GB
- Quantized 4-bit mode: Runs on mid-range systems
- Ideal: NVIDIA RTX 3060 / 4060+
π 9. License
This model inherits the Gemma License, which allows:
- Research use
- Commercial use under conditions
- Attribution to Google
Full license details:
https://ai.google.dev/gemma/terms
π 10. Citation
If you use this model:
@misc{nexura_gemma2b_2025,
title={Nexura-Gemma-2B},
model={Custom fine-tuned Gemma-2B},
author={Arun Vpp},
year={2025},
publisher={Hugging Face}
}
π― Final Notes
This README is fully compatible with Hugging Faceβs metadata requirements.
Just paste it into your README.md β no modification needed.
- Downloads last month
- 16