πŸ”· Nexura-Gemma-2B

A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model

Nexura-Gemma-2B is a custom fine-tuned variant of Google’s Gemma-2B model.
It is trained in two stages:

  1. SFT (Supervised Fine-Tuning) using high-quality instruction datasets
  2. DPO (Direct Preference Optimization) for preference alignment

The model follows a strict XML-style instruction format, exactly matching the SFT training data:

<user>
{instruction}
</user>

<assistant>
{response}

πŸ“Œ 1. Base Model

  • Base: google/gemma-2b
  • Architecture: Decoder-only transformer LLM
  • Tokenizer: Gemma tokenizer (sentencepiece)
  • Training Type: QLoRA (SFT) + DPO
  • Language: English
  • Usage: General-purpose text generation & instruction following

πŸ“Œ 2. Datasets Used

🟦 A. SFT Dataset (Supervised Fine-Tuning)

Merged into:

train_sft_50k.jsonl

Includes:

  • tatsu-lab/alpaca (~52k)
  • databricks/dolly-15k
  • Additional filtered samples:
    • lamini_20k
    • ign_20k
    • ultrachat_20k
      (mostly skipped due to filtering)

SFT Prompt Format

<user>
{instruction}
</user>

<assistant>
{response}

🟩 B. DPO Dataset (Preference Alignment)

Merged from:

  • Anthropic HH-RLHF
  • Stanford SHP
  • UltraFeedback
  • JudgeLM

Used in chosen-vs-rejected pair format.


πŸ“Œ 3. Training Details

🟦 SFT (Supervised Fine-Tuning)

QLoRA Configuration:

  • Rank: 8
  • Alpha: 16
  • Dropout: 0.05
  • Precision: bfloat16
  • Epochs: 1
  • LR: 2e-4
  • Gradient Accumulation: 20
  • Target Modules:
    • q_proj, k_proj, v_proj, o_proj
    • gate_proj, up_proj, down_proj

🟩 DPO (Direct Preference Optimization)

  • Beta: 0.1
  • Learning rate: 5e-5
  • Grad Accumulation: 8
  • Policy model = SFT-trained adapter

πŸ“Œ 4. Inference Instructions

Below is the exact format required to prompt the model, matching the training:

Prompt Template

<user>
{your_message}
</user>

<assistant>

🟦 FastAPI Streaming Server (server.py)

This model was tested using a custom FastAPI server with:

  • Local model loading (no HF auto-download)
  • SFT-exact prompt builder
  • Tag suppression to prevent invalid XML-like output
  • Greedy decoding:
    • do_sample=False
    • repetition_penalty=1.3
    • no_repeat_ngram_size=4

Example: Python Local Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_dir = "Nexura-gemma2b-sft-dpo"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")

prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
    repetition_penalty=1.3,
    no_repeat_ngram_size=4
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

🟩 Curl API Example

curl -X POST http://localhost:8000/api/chat \
     -H "Content-Type: application/json" \
     -d '{"messages":[{"role":"user","content":"hi"}]}'

πŸ“Œ 5. Intended Use

βœ” Recommended Uses

  • Chat assistants
  • Instruction following
  • Educational Q/A
  • Coding help
  • Summaries
  • Reasoning tasks
  • Content rewriting

❌ Not Recommended

  • Medical, legal, or financial advice
  • Real-world decision making
  • High-risk or safety-critical systems
  • Generating harmful, biased, or toxic content

πŸ“Œ 6. Strengths

  • Lightweight (2B parameters)
  • Fast inference on consumer GPUs
  • Clean behavior after SFT formatting correction
  • Strong alignment after DPO training
  • Stable responses due to greedy decoding

πŸ“Œ 7. Limitations

  • Limited knowledge compared to larger LLMs
  • May hallucinate if prompt format is not followed
  • Not multilingual
  • No factual updates after 2023 (Gemma limitation)

πŸ“Œ 8. Hardware Requirements

  • GPU Recommended: 8GB+ VRAM
  • Minimum CPU RAM: 6GB
  • Quantized 4-bit mode: Runs on mid-range systems
  • Ideal: NVIDIA RTX 3060 / 4060+

πŸ“Œ 9. License

This model inherits the Gemma License, which allows:

  • Research use
  • Commercial use under conditions
  • Attribution to Google

Full license details:
https://ai.google.dev/gemma/terms


πŸ“Œ 10. Citation

If you use this model:

@misc{nexura_gemma2b_2025,
  title={Nexura-Gemma-2B},
  model={Custom fine-tuned Gemma-2B},
  author={Arun Vpp},
  year={2025},
  publisher={Hugging Face}
}

🎯 Final Notes

This README is fully compatible with Hugging Face’s metadata requirements.
Just paste it into your README.md β€” no modification needed.

Downloads last month
16
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for arunvpp05/Nexura-Gemma2B

Base model

google/gemma-2b
Adapter
(23560)
this model
Adapters
2 models

Datasets used to train arunvpp05/Nexura-Gemma2B