NarrativAI
/

Cakrawala-Llama-3.1-70B

Model card Files Files and versions

passthepizza commited on Nov 25, 2024

Commit

5902390

·

verified ·

1 Parent(s): f3e1687

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -18,19 +18,19 @@ Cakrawala-70B is a fine-tuned variant of the Llama-3.1-70B-Instruct model, speci
 ## 🧪 The Secret Sauce
 ### Training Diet:
-- Fed with 5,867 conversation pairs
 - Each conversation is a minimum 12-13 turns long
 - Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
 ### Tech Wizardry:
-- Trained on the mighty Llama-3.1-70B-Instruct
 - Fine-tuned using QLoRA
-- Trained over 3 epochs
 ## Training Parameters
-- Gradient Accumulation Steps: 16
 - Micro Batch Size: 4
-- Learning Rate: 0.0003
 - Optimizer: AdamW
 - Scheduler: Cosine
 - Mixed Precision: BF16 & FP16 with TF32 support

 ## 🧪 The Secret Sauce
 ### Training Diet:
+- Fed with 13,000 conversation pairs
 - Each conversation is a minimum 12-13 turns long
 - Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
 ### Tech Wizardry:
+- Trained on Llama-3.1-70B-Instruct
 - Fine-tuned using QLoRA
+- Trained over 2 epochs
 ## Training Parameters
+- Gradient Accumulation Steps: 1
 - Micro Batch Size: 4
+- Learning Rate: 0.0002
 - Optimizer: AdamW
 - Scheduler: Cosine
 - Mixed Precision: BF16 & FP16 with TF32 support