kinzakhan1
/

SRD_V6

chain-of-thought

Model card Files Files and versions

SRD_V6 - Standard Reasoning Model (Chain-of-Thought)

Overview

Fine-tuned Llama 3.1 8B on Standard Reasoning Dataset (CoT) with adjusted hyperparameters.

Training Details

Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
Training Framework: Unsloth
Dataset: CoT Reasoning Data (CoT_reasoning_unsloth.jsonl)
Examples: 9340
Training Time: 0.33 hours
Final Loss: 1.9127

Hyperparameters (Adjusted for SRD)

Learning Rate: 2e-05 (2x higher than CRD)
Max Steps: 500 (more than CRD)
LoRA Rank: 8
LoRA Alpha: 16
LoRA Dropout: 0.05
Warmup: 10%
Max Sequence Length: 2048
Effective Batch Size: 8

Notes

SRD dataset has longer, more complex reasoning chains which results in higher baseline loss. Hyperparameters adjusted accordingly.

Part of Experiment

kinzakhan1/CRD_V6 - Clinical reasoning only
kinzakhan1/SRD_V6 - Standard reasoning only (this model)
kinzakhan1/MIXED_V6 - Mixed dataset

Downloads last month: 34

Safetensors

Model size

8B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kinzakhan1/SRD_V6

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2140)

this model