SRD_V6 - Standard Reasoning Model (Chain-of-Thought)

Overview

Fine-tuned Llama 3.1 8B on Standard Reasoning Dataset (CoT) with adjusted hyperparameters.

Training Details

  • Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Training Framework: Unsloth
  • Dataset: CoT Reasoning Data (CoT_reasoning_unsloth.jsonl)
  • Examples: 9340
  • Training Time: 0.33 hours
  • Final Loss: 1.9127

Hyperparameters (Adjusted for SRD)

  • Learning Rate: 2e-05 (2x higher than CRD)
  • Max Steps: 500 (more than CRD)
  • LoRA Rank: 8
  • LoRA Alpha: 16
  • LoRA Dropout: 0.05
  • Warmup: 10%
  • Max Sequence Length: 2048
  • Effective Batch Size: 8

Notes

SRD dataset has longer, more complex reasoning chains which results in higher baseline loss. Hyperparameters adjusted accordingly.

Part of Experiment

  • kinzakhan1/CRD_V6 - Clinical reasoning only
  • kinzakhan1/SRD_V6 - Standard reasoning only (this model)
  • kinzakhan1/MIXED_V6 - Mixed dataset
Downloads last month
34
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kinzakhan1/SRD_V6

Finetuned
(2140)
this model