SRD_V6 - Standard Reasoning Model (Chain-of-Thought)
Overview
Fine-tuned Llama 3.1 8B on Standard Reasoning Dataset (CoT) with adjusted hyperparameters.
Training Details
- Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
- Training Framework: Unsloth
- Dataset: CoT Reasoning Data (CoT_reasoning_unsloth.jsonl)
- Examples: 9340
- Training Time: 0.33 hours
- Final Loss: 1.9127
Hyperparameters (Adjusted for SRD)
- Learning Rate: 2e-05 (2x higher than CRD)
- Max Steps: 500 (more than CRD)
- LoRA Rank: 8
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- Warmup: 10%
- Max Sequence Length: 2048
- Effective Batch Size: 8
Notes
SRD dataset has longer, more complex reasoning chains which results in higher baseline loss. Hyperparameters adjusted accordingly.
Part of Experiment
- kinzakhan1/CRD_V6 - Clinical reasoning only
- kinzakhan1/SRD_V6 - Standard reasoning only (this model)
- kinzakhan1/MIXED_V6 - Mixed dataset
- Downloads last month
- 34
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for kinzakhan1/SRD_V6
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct