arunvpp05 commited on
Commit
fc53fe0
Β·
verified Β·
1 Parent(s): b080705

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +198 -152
README.md CHANGED
@@ -1,231 +1,277 @@
1
- Nexura-Gemma-2B
2
- ==============================
3
-
4
- A custom fine-tuned large language model based on google/gemma-2b.
5
- This model was trained using a two-stage training pipeline:
6
- 1. Supervised Fine-Tuning (SFT)
7
- 2. Direct Preference Optimization (DPO)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- The model is optimized for instruction-following, safe helpful answers, and clean single-response generation.
 
10
 
 
 
11
 
12
- ========================================================
13
- MODEL BASE
14
- ========================================================
15
 
16
- Base model: google/gemma-2b
17
- Architecture: Decoder-only transformer
18
- Training type: SFT + DPO
19
- Language: English
20
- Framework: Hugging Face Transformers
21
- Format used for training:
22
 
 
23
  <user>
24
  {instruction}
25
  </user>
 
26
  <assistant>
27
  {response}
28
- </assistant>
 
 
 
 
29
 
 
 
 
 
 
 
30
 
31
- ========================================================
32
- DATASETS USED FOR TRAINING
33
- ========================================================
34
 
35
- 1. SUPERVISED FINE-TUNING (SFT)
36
 
37
- Custom merged dataset file: train_sft_50k.jsonl
38
- Included datasets:
39
 
40
- - tatsu-lab/alpaca (~52k samples)
41
- - databricks/dolly-15k
42
- - Additional sources (mostly filtered):
43
- - lamini_20k
44
- - ign_20k
45
- - ultrachat_20k
46
 
47
- All samples were normalized into the Gemma SFT format:
48
 
 
 
 
 
 
 
 
 
 
 
 
49
  <user>
50
  {instruction}
51
  </user>
 
52
  <assistant>
53
  {response}
54
- </assistant>
55
 
 
56
 
57
- 2. DPO (DIRECT PREFERENCE OPTIMIZATION)
 
58
 
59
- Merged and normalized preference datasets:
 
 
 
60
 
61
- - Anthropic HH-RLHF
62
- - Stanford SHP
63
- - UltraFeedback
64
- - JudgeLM
65
 
66
- Used as chosen vs rejected comparison pairs for preference learning during DPO.
67
 
 
68
 
69
- ========================================================
70
- TRAINING PROCESS
71
- ========================================================
72
 
73
- SFT TRAINING CONFIG:
74
- - Method: QLoRA
75
- - Rank (r): 8
76
- - Alpha: 16
77
- - Dropout: 0.05
78
- - Precision: BF16
79
- - Epochs: 1
80
- - Learning rate: 2e-4
81
- - Gradient Accumulation: 20
82
- - Target modules:
83
- q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
84
 
85
- DPO TRAINING CONFIG:
86
- - Beta (KL Penalty): 0.1
87
- - Learning rate: 5e-5
88
- - Gradient Accumulation: 8
89
- - Policy model: SFT-trained adapter
90
 
 
91
 
92
- ========================================================
93
- INFERENCE AND SERVER SETUP
94
- ========================================================
 
95
 
96
- The model is intended to be served locally using FastAPI with streaming outputs.
97
 
98
- SERVER BEHAVIOR:
99
- - Loads tokenizer + model locally
100
- - Uses greedy decoding
101
- - Blocks invalid XML-like tags
102
- - Uses EXACT SFT prompt format for inference
103
 
104
- PROMPT FORMAT USED AT INFERENCE:
105
 
 
 
106
  <user>
107
- {system_prompt}
108
- {user_message}
109
  </user>
110
 
111
  <assistant>
 
 
 
112
 
 
113
 
114
- SERVER DECODE SETTINGS:
115
- - do_sample: false
116
- - repetition_penalty: 1.3
117
- - no_repeat_ngram_size: 4
118
- - max_new_tokens: 160
119
 
 
 
 
 
 
 
 
120
 
121
- ========================================================
122
- EXAMPLE: PYTHON INFERENCE
123
- ========================================================
124
 
 
125
  from transformers import AutoTokenizer, AutoModelForCausalLM
126
  import torch
127
 
128
- model_path = "Nexura-Gemma2B"
129
- tokenizer = AutoTokenizer.from_pretrained(model_path)
130
- model = AutoModelForCausalLM.from_pretrained(model_path)
131
- model.eval()
132
 
133
- prompt = """<user>
134
- Explain recursion simply.
135
- </user>
136
 
137
- <assistant>
138
- """
139
 
140
- inputs = tokenizer(prompt, return_tensors="pt")
141
- outputs = model.generate(
 
142
  **inputs,
143
- max_new_tokens=200,
144
  do_sample=False,
145
  repetition_penalty=1.3,
146
  no_repeat_ngram_size=4
147
  )
148
 
149
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
150
 
 
151
 
152
- ========================================================
153
- EXAMPLE: CURL API CALL
154
- ========================================================
155
 
156
- curl -X POST "http://localhost:8000/api/chat" \
157
- -H "Content-Type: application/json" \
158
- -d '{"messages":[{"role":"user","content":"hi"}]}'
 
 
159
 
 
160
 
161
- ========================================================
162
- HOW TO USE THE MODEL
163
- ========================================================
164
 
165
- Recommended prompting style:
166
 
167
- <user>
168
- Write a short explanation of quantum computing.
169
- </user>
170
- <assistant>
 
 
 
171
 
 
172
 
173
- INTENDED USE:
174
- - Instruction following
175
- - Question answering
176
- - Reasoning
177
- - Helpful assistant tasks
178
- - Chat-based systems
179
- - Personal AI assistants
180
 
181
- NOT INTENDED FOR:
182
- - Medical advice
183
- - Legal advice
184
- - Harmful, abusive, or disallowed content
185
- - High-risk decision making
186
 
187
- HARDWARE REQUIREMENTS:
188
- - GPU recommended (6GB+ VRAM)
189
- - CPU mode supported (slower)
190
 
 
 
 
 
 
191
 
192
- ========================================================
193
- MODEL STRENGTHS & LIMITATIONS
194
- ========================================================
195
 
196
- STRENGTHS:
197
- - Fast inference (2B model)
198
- - Clean instruction-following behavior
199
- - Stable responses (trained with DPO)
200
- - Predictable and deterministic decoding
201
 
202
- LIMITATIONS:
203
- - Not a replacement for expert domains
204
- - Not a factual knowledge base
205
- - May hallucinate if pushed outside training scope
206
 
 
207
 
208
- ========================================================
209
- HUGGING FACE MODEL CARD METADATA
210
- ========================================================
211
 
212
- license: Gemma License
213
- base_model: google/gemma-2b
214
- datasets:
215
- - tatsu-lab/alpaca
216
- - databricks/dolly-15k
217
- - anthropic/hh-rlhf
218
- - stanfordnlp/shp
219
- - UltraFeedback
220
- - JudgeLM
221
- language: en
222
- library_name: transformers
223
- pipeline_tag: text-generation
224
- tags: ["gemma", "sft", "dpo", "alignment", "lora", "instruction-following"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
225
 
 
226
 
227
- ========================================================
228
- OFFICIAL LICENSE
229
- ========================================================
230
 
231
- This model must follow the Gemma License published by Google.
 
1
+ ---
2
+ license: other
3
+ datasets:
4
+ - tatsu-lab/alpaca
5
+ - databricks/databricks-dolly-15k
6
+ - anthropic/hh-rlhf
7
+ - stanfordnlp/SHP
8
+ - allenai/ultrafeedback
9
+ - jondurbin/judgelm
10
+ language:
11
+ - en
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ base_model: google/gemma-2b
15
+ tags:
16
+ - gemma
17
+ - sft
18
+ - dpo
19
+ - lora
20
+ - qlora
21
+ - alignment
22
+ - instruction-following
23
+ - fine-tuned
24
+ ---
25
 
26
+ # πŸ”· Nexura-Gemma-2B
27
+ ### A Supervised Fine-Tuned + DPO-Aligned Gemma-2B Model
28
 
29
+ Nexura-Gemma-2B is a custom fine-tuned variant of **Google’s Gemma-2B** model.
30
+ It is trained in **two stages**:
31
 
32
+ 1. **SFT (Supervised Fine-Tuning)** using high-quality instruction datasets
33
+ 2. **DPO (Direct Preference Optimization)** for preference alignment
 
34
 
35
+ The model follows a **strict XML-style instruction format**, exactly matching the SFT training data:
 
 
 
 
 
36
 
37
+ ```
38
  <user>
39
  {instruction}
40
  </user>
41
+
42
  <assistant>
43
  {response}
44
+ ```
45
+
46
+ ---
47
+
48
+ # πŸ“Œ 1. Base Model
49
 
50
+ - **Base:** `google/gemma-2b`
51
+ - **Architecture:** Decoder-only transformer LLM
52
+ - **Tokenizer:** Gemma tokenizer (sentencepiece)
53
+ - **Training Type:** QLoRA (SFT) + DPO
54
+ - **Language:** English
55
+ - **Usage:** General-purpose text generation & instruction following
56
 
57
+ ---
 
 
58
 
59
+ # πŸ“Œ 2. Datasets Used
60
 
61
+ ## **🟦 A. SFT Dataset (Supervised Fine-Tuning)**
 
62
 
63
+ Merged into:
64
+ ```
65
+ train_sft_50k.jsonl
66
+ ```
 
 
67
 
68
+ Includes:
69
 
70
+ - `tatsu-lab/alpaca` (~52k)
71
+ - `databricks/dolly-15k`
72
+ - Additional filtered samples:
73
+ - lamini_20k
74
+ - ign_20k
75
+ - ultrachat_20k
76
+ *(mostly skipped due to filtering)*
77
+
78
+ ### SFT Prompt Format
79
+
80
+ ```
81
  <user>
82
  {instruction}
83
  </user>
84
+
85
  <assistant>
86
  {response}
87
+ ```
88
 
89
+ ---
90
 
91
+ ## **🟩 B. DPO Dataset (Preference Alignment)**
92
+ Merged from:
93
 
94
+ - **Anthropic HH-RLHF**
95
+ - **Stanford SHP**
96
+ - **UltraFeedback**
97
+ - **JudgeLM**
98
 
99
+ Used in chosen-vs-rejected pair format.
 
 
 
100
 
101
+ ---
102
 
103
+ # πŸ“Œ 3. Training Details
104
 
105
+ ## 🟦 **SFT (Supervised Fine-Tuning)**
106
+ **QLoRA Configuration:**
 
107
 
108
+ - Rank: **8**
109
+ - Alpha: **16**
110
+ - Dropout: **0.05**
111
+ - Precision: **bfloat16**
112
+ - Epochs: **1**
113
+ - LR: **2e-4**
114
+ - Gradient Accumulation: **20**
115
+ - Target Modules:
116
+ - q_proj, k_proj, v_proj, o_proj
117
+ - gate_proj, up_proj, down_proj
 
118
 
119
+ ---
 
 
 
 
120
 
121
+ ## 🟩 **DPO (Direct Preference Optimization)**
122
 
123
+ - Beta: **0.1**
124
+ - Learning rate: **5e-5**
125
+ - Grad Accumulation: **8**
126
+ - Policy model = **SFT-trained adapter**
127
 
128
+ ---
129
 
130
+ # πŸ“Œ 4. Inference Instructions
 
 
 
 
131
 
132
+ Below is the **exact format required to prompt the model**, matching the training:
133
 
134
+ ### **Prompt Template**
135
+ ```
136
  <user>
137
+ {your_message}
 
138
  </user>
139
 
140
  <assistant>
141
+ ```
142
+
143
+ ---
144
 
145
+ ## 🟦 FastAPI Streaming Server (`server.py`)
146
 
147
+ This model was tested using a custom FastAPI server with:
 
 
 
 
148
 
149
+ - Local model loading (no HF auto-download)
150
+ - SFT-exact prompt builder
151
+ - Tag suppression to prevent invalid XML-like output
152
+ - Greedy decoding:
153
+ - `do_sample=False`
154
+ - `repetition_penalty=1.3`
155
+ - `no_repeat_ngram_size=4`
156
 
157
+ ### Example: Python Local Inference
 
 
158
 
159
+ ```python
160
  from transformers import AutoTokenizer, AutoModelForCausalLM
161
  import torch
162
 
163
+ model_dir = "Nexura-gemma2b-sft-dpo"
 
 
 
164
 
165
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
166
+ model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto")
 
167
 
168
+ prompt = "<user>\nExplain recursion.\n</user>\n\n<assistant>\n"
 
169
 
170
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
171
+
172
+ output = model.generate(
173
  **inputs,
174
+ max_new_tokens=256,
175
  do_sample=False,
176
  repetition_penalty=1.3,
177
  no_repeat_ngram_size=4
178
  )
179
 
180
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
181
+ ```
182
 
183
+ ---
184
 
185
+ ## 🟩 Curl API Example
 
 
186
 
187
+ ```
188
+ curl -X POST http://localhost:8000/api/chat \
189
+ -H "Content-Type: application/json" \
190
+ -d '{"messages":[{"role":"user","content":"hi"}]}'
191
+ ```
192
 
193
+ ---
194
 
195
+ # πŸ“Œ 5. Intended Use
 
 
196
 
197
+ ### βœ” Recommended Uses
198
 
199
+ - Chat assistants
200
+ - Instruction following
201
+ - Educational Q/A
202
+ - Coding help
203
+ - Summaries
204
+ - Reasoning tasks
205
+ - Content rewriting
206
 
207
+ ### ❌ Not Recommended
208
 
209
+ - Medical, legal, or financial advice
210
+ - Real-world decision making
211
+ - High-risk or safety-critical systems
212
+ - Generating harmful, biased, or toxic content
 
 
 
213
 
214
+ ---
 
 
 
 
215
 
216
+ # πŸ“Œ 6. Strengths
 
 
217
 
218
+ - Lightweight (2B parameters)
219
+ - Fast inference on consumer GPUs
220
+ - Clean behavior after SFT formatting correction
221
+ - Strong alignment after DPO training
222
+ - Stable responses due to greedy decoding
223
 
224
+ ---
 
 
225
 
226
+ # πŸ“Œ 7. Limitations
 
 
 
 
227
 
228
+ - Limited knowledge compared to larger LLMs
229
+ - May hallucinate if prompt format is not followed
230
+ - Not multilingual
231
+ - No factual updates after 2023 (Gemma limitation)
232
 
233
+ ---
234
 
235
+ # πŸ“Œ 8. Hardware Requirements
 
 
236
 
237
+ - **GPU Recommended:** 8GB+ VRAM
238
+ - **Minimum CPU RAM:** 6GB
239
+ - **Quantized 4-bit mode:** Runs on mid-range systems
240
+ - **Ideal:** NVIDIA RTX 3060 / 4060+
241
+
242
+ ---
243
+
244
+ # πŸ“Œ 9. License
245
+
246
+ This model inherits the **Gemma License**, which allows:
247
+
248
+ - Research use
249
+ - Commercial use under conditions
250
+ - Attribution to Google
251
+
252
+ Full license details:
253
+ https://ai.google.dev/gemma/terms
254
+
255
+ ---
256
+
257
+ # πŸ“Œ 10. Citation
258
+
259
+ If you use this model:
260
+
261
+ ```
262
+ @misc{nexura_gemma2b_2025,
263
+ title={Nexura-Gemma-2B},
264
+ model={Custom fine-tuned Gemma-2B},
265
+ author={Arun Vpp},
266
+ year={2025},
267
+ publisher={Hugging Face}
268
+ }
269
+ ```
270
+
271
+ ---
272
 
273
+ # 🎯 Final Notes
274
 
275
+ This README is fully compatible with Hugging Face’s metadata requirements.
276
+ Just paste it into your `README.md` β€” no modification needed.
 
277