Update README.md
Browse files
README.md
CHANGED
|
@@ -18,19 +18,19 @@ Cakrawala-70B is a fine-tuned variant of the Llama-3.1-70B-Instruct model, speci
|
|
| 18 |
## 🧪 The Secret Sauce
|
| 19 |
|
| 20 |
### Training Diet:
|
| 21 |
-
- Fed with
|
| 22 |
- Each conversation is a minimum 12-13 turns long
|
| 23 |
- Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
|
| 24 |
|
| 25 |
### Tech Wizardry:
|
| 26 |
-
- Trained on
|
| 27 |
- Fine-tuned using QLoRA
|
| 28 |
-
- Trained over
|
| 29 |
|
| 30 |
## Training Parameters
|
| 31 |
-
- Gradient Accumulation Steps:
|
| 32 |
- Micro Batch Size: 4
|
| 33 |
-
- Learning Rate: 0.
|
| 34 |
- Optimizer: AdamW
|
| 35 |
- Scheduler: Cosine
|
| 36 |
- Mixed Precision: BF16 & FP16 with TF32 support
|
|
|
|
| 18 |
## 🧪 The Secret Sauce
|
| 19 |
|
| 20 |
### Training Diet:
|
| 21 |
+
- Fed with 13,000 conversation pairs
|
| 22 |
- Each conversation is a minimum 12-13 turns long
|
| 23 |
- Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
|
| 24 |
|
| 25 |
### Tech Wizardry:
|
| 26 |
+
- Trained on Llama-3.1-70B-Instruct
|
| 27 |
- Fine-tuned using QLoRA
|
| 28 |
+
- Trained over 2 epochs
|
| 29 |
|
| 30 |
## Training Parameters
|
| 31 |
+
- Gradient Accumulation Steps: 1
|
| 32 |
- Micro Batch Size: 4
|
| 33 |
+
- Learning Rate: 0.0002
|
| 34 |
- Optimizer: AdamW
|
| 35 |
- Scheduler: Cosine
|
| 36 |
- Mixed Precision: BF16 & FP16 with TF32 support
|