arunvpp05
/

Nexura-Gemma2B

@@ -1,231 +1,277 @@
-Nexura-Gemma-2B
-==============================
-A custom fine-tuned large language model based on google/gemma-2b.
-This model was trained using a two-stage training pipeline:
-1. Supervised Fine-Tuning (SFT)
-2. Direct Preference Optimization (DPO)
-The model is optimized for instruction-following, safe helpful answers, and clean single-response generation.
-========================================================
-MODEL BASE
-========================================================
-Base model: google/gemma-2b
-Architecture: Decoder-only transformer
-Training type: SFT + DPO
-Language: English
-Framework: Hugging Face Transformers
-Format used for training:
 <user>
 {instruction}
 </user>
 <assistant>
 {response}
-</assistant>
-========================================================
-DATASETS USED FOR TRAINING
-========================================================
-1. SUPERVISED FINE-TUNING (SFT)
-Custom merged dataset file: train_sft_50k.jsonl
-Included datasets:
-- tatsu-lab/alpaca  (~52k samples)
-- databricks/dolly-15k
-- Additional sources (mostly filtered):
-    - lamini_20k
-    - ign_20k
-    - ultrachat_20k
-All samples were normalized into the Gemma SFT format:
 <user>
 {instruction}
 </user>
 <assistant>
 {response}
-</assistant>
-2. DPO (DIRECT PREFERENCE OPTIMIZATION)
-Merged and normalized preference datasets:
-- Anthropic HH-RLHF
-- Stanford SHP
-- UltraFeedback
-- JudgeLM
-Used as chosen vs rejected comparison pairs for preference learning during DPO.
-========================================================
-TRAINING PROCESS
-========================================================
-SFT TRAINING CONFIG:
-- Method: QLoRA
-- Rank (r): 8
-- Alpha: 16
-- Dropout: 0.05
-- Precision: BF16
-- Epochs: 1
-- Learning rate: 2e-4
-- Gradient Accumulation: 20
-- Target modules:
-  q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
-DPO TRAINING CONFIG:
-- Beta (KL Penalty): 0.1
-- Learning rate: 5e-5
-- Gradient Accumulation: 8
-- Policy model: SFT-trained adapter
-========================================================
-INFERENCE AND SERVER SETUP
-========================================================
-The model is intended to be served locally using FastAPI with streaming outputs.
-SERVER BEHAVIOR:
-- Loads tokenizer + model locally
-- Uses greedy decoding
-- Blocks invalid XML-like tags
-- Uses EXACT SFT prompt format for inference
-PROMPT FORMAT USED AT INFERENCE:
 <user>
-{system_prompt}
-{user_message}
 </user>
 <assistant>
-SERVER DECODE SETTINGS:
-- do_sample: false
-- repetition_penalty: 1.3
-- no_repeat_ngram_size: 4
-- max_new_tokens: 160
-========================================================
-EXAMPLE: PYTHON INFERENCE
-========================================================
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-model_path = "Nexura-Gemma2B"
-tokenizer = AutoTokenizer.from_pretrained(model_path)
-model = AutoModelForCausalLM.from_pretrained(model_path)
-model.eval()
-prompt = """<user>
-Explain recursion simply.
-</user>
-<assistant>
-"""
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(
     **inputs,
-    max_new_tokens=200,
     do_sample=False,
     repetition_penalty=1.3,
     no_repeat_ngram_size=4
 )
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-========================================================
-EXAMPLE: CURL API CALL
-========================================================
-curl -X POST "http://localhost:8000/api/chat" \
-    -H "Content-Type: application/json" \
-    -d '{"messages":[{"role":"user","content":"hi"}]}'
-========================================================
-HOW TO USE THE MODEL
-========================================================
-Recommended prompting style:
-<user>
-Write a short explanation of quantum computing.
-</user>
-<assistant>
-INTENDED USE:
-- Instruction following
-- Question answering
-- Reasoning
-- Helpful assistant tasks
-- Chat-based systems
-- Personal AI assistants
-NOT INTENDED FOR:
-- Medical advice
-- Legal advice
-- Harmful, abusive, or disallowed content
-- High-risk decision making
-HARDWARE REQUIREMENTS:
-- GPU recommended (6GB+ VRAM)
-- CPU mode supported (slower)
-========================================================
-MODEL STRENGTHS & LIMITATIONS
-========================================================
-STRENGTHS:
-- Fast inference (2B model)
-- Clean instruction-following behavior
-- Stable responses (trained with DPO)
-- Predictable and deterministic decoding
-LIMITATIONS:
-- Not a replacement for expert domains
-- Not a factual knowledge base
-- May hallucinate if pushed outside training scope
-========================================================
-HUGGING FACE MODEL CARD METADATA
-========================================================
-license: Gemma License
-base_model: google/gemma-2b
-datasets:
-- tatsu-lab/alpaca
-- databricks/dolly-15k
-- anthropic/hh-rlhf
-- stanfordnlp/shp
-- UltraFeedback
-- JudgeLM
-language: en
-library_name: transformers
-pipeline_tag: text-generation
-tags: ["gemma", "sft", "dpo", "alignment", "lora", "instruction-following"]
-========================================================
-OFFICIAL LICENSE
-========================================================
-This model must follow the Gemma License published by Google.

+---
+license: other
+datasets:
+  - tatsu-lab/alpaca
+  - databricks/databricks-dolly-15k
+  - anthropic/hh-rlhf
+  - stanfordnlp/SHP
+  - allenai/ultrafeedback
+  - jondurbin/judgelm
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+base_model: google/gemma-2b
+tags:
+  - gemma
+  - sft
+  - dpo
+  - lora
+  - qlora
+  - alignment
+  - instruction-following
+  - fine-tuned
+---
+# 🔷 Nexura-Gemma-2B
+### A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model
+Nexura-Gemma-2B is a custom fine-tuned variant of **Google’s Gemma-2B** model.
+It is trained in **two stages**:
+1. **SFT (Supervised Fine-Tuning)** using high-quality instruction datasets
+2. **DPO (Direct Preference Optimization)** for preference alignment
+The model follows a **strict XML-style instruction format**, exactly matching the SFT training data:
+```
 <user>
 {instruction}
 </user>
 <assistant>
 {response}
+```
+---
+# 📌 1. Base Model
+- **Base:** `google/gemma-2b`
+- **Architecture:** Decoder-only transformer LLM
+- **Tokenizer:** Gemma tokenizer (sentencepiece)
+- **Training Type:** QLoRA (SFT) + DPO
+- **Language:** English
+- **Usage:** General-purpose text generation & instruction following
+---
+# 📌 2. Datasets Used
+## **🟦 A. SFT Dataset (Supervised Fine-Tuning)**
+Merged into:
+```
+train_sft_50k.jsonl
+```
+Includes:
+- `tatsu-lab/alpaca` (~52k)
+- `databricks/dolly-15k`
+- Additional filtered samples:
+  - lamini_20k
+  - ign_20k
+  - ultrachat_20k
+  *(mostly skipped due to filtering)*
+### SFT Prompt Format
+```
 <user>
 {instruction}
 </user>
 <assistant>
 {response}
+```
+---
+## **🟩 B. DPO Dataset (Preference Alignment)**
+Merged from:
+- **Anthropic HH-RLHF**
+- **Stanford SHP**
+- **UltraFeedback**
+- **JudgeLM**
+Used in chosen-vs-rejected pair format.
+---
+# 📌 3. Training Details
+## 🟦 **SFT (Supervised Fine-Tuning)**
+**QLoRA Configuration:**
+- Rank: **8**
+- Alpha: **16**
+- Dropout: **0.05**
+- Precision: **bfloat16**
+- Epochs: **1**
+- LR: **2e-4**
+- Gradient Accumulation: **20**
+- Target Modules:
+  - q_proj, k_proj, v_proj, o_proj
+  - gate_proj, up_proj, down_proj
+---
+## 🟩 **DPO (Direct Preference Optimization)**
+- Beta: **0.1**
+- Learning rate: **5e-5**
+- Grad Accumulation: **8**
+- Policy model = **SFT-trained adapter**
+---
+# 📌 4. Inference Instructions
+Below is the **exact format required to prompt the model**, matching the training:
+### **Prompt Template**
+```
 <user>
+{your_message}
 </user>
 <assistant>
+```
+---
+## 🟦 FastAPI Streaming Server (`server.py`)
+This model was tested using a custom FastAPI server with:
+- Local model loading (no HF auto-download)
+- SFT-exact prompt builder
+- Tag suppression to prevent invalid XML-like output
+- Greedy decoding:
+  - `do_sample=False`
+  - `repetition_penalty=1.3`
+  - `no_repeat_ngram_size=4`
+### Example: Python Local Inference
+```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+model_dir = "Nexura-gemma2b-sft-dpo"
+tokenizer = AutoTokenizer.from_pretrained(model_dir)
+model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")
+prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+output = model.generate(
     **inputs,
+    max_new_tokens=256,
     do_sample=False,
     repetition_penalty=1.3,
     no_repeat_ngram_size=4
 )
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+---
+## 🟩 Curl API Example
+```
+curl -X POST http://localhost:8000/api/chat \
+     -H "Content-Type: application/json" \
+     -d '{"messages":[{"role":"user","content":"hi"}]}'
+```
+---
+# 📌 5. Intended Use
+### ✔ Recommended Uses
+- Chat assistants
+- Instruction following
+- Educational Q/A
+- Coding help
+- Summaries
+- Reasoning tasks
+- Content rewriting
+### ❌ Not Recommended
+- Medical, legal, or financial advice
+- Real-world decision making
+- High-risk or safety-critical systems
+- Generating harmful, biased, or toxic content
+---
+# 📌 6. Strengths
+- Lightweight (2B parameters)
+- Fast inference on consumer GPUs
+- Clean behavior after SFT formatting correction
+- Strong alignment after DPO training
+- Stable responses due to greedy decoding
+---
+# 📌 7. Limitations
+- Limited knowledge compared to larger LLMs
+- May hallucinate if prompt format is not followed
+- Not multilingual
+- No factual updates after 2023 (Gemma limitation)
+---
+# 📌 8. Hardware Requirements
+- **GPU Recommended:** 8GB+ VRAM
+- **Minimum CPU RAM:** 6GB
+- **Quantized 4-bit mode:** Runs on mid-range systems
+- **Ideal:** NVIDIA RTX 3060 / 4060+
+---
+# 📌 9. License
+This model inherits the **Gemma License**, which allows:
+- Research use
+- Commercial use under conditions
+- Attribution to Google
+Full license details:
+https://ai.google.dev/gemma/terms
+---
+# 📌 10. Citation
+If you use this model:
+```
+@misc{nexura_gemma2b_2025,
+  title={Nexura-Gemma-2B},
+  model={Custom fine-tuned Gemma-2B},
+  author={Arun Vpp},
+  year={2025},
+  publisher={Hugging Face}
+}
+```
+---
+# 🎯 Final Notes
+This README is fully compatible with Hugging Face’s metadata requirements.
+Just paste it into your `README.md` — no modification needed.