train_winogrande_1756735778

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2313
  • Num Input Tokens Seen: 30120720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2295 0.5000 9090 0.2313 1506080
0.2252 1.0001 18180 0.2311 3011568
0.2313 1.5001 27270 0.2313 4517568
0.2455 2.0001 36360 0.2337 6023712
0.2317 2.5001 45450 0.2318 7529008
0.2337 3.0002 54540 0.2314 9035904
0.2352 3.5002 63630 0.2336 10541968
0.24 4.0002 72720 0.2311 12047824
0.228 4.5002 81810 0.2317 13553584
0.2192 5.0003 90900 0.2319 15059504
0.2379 5.5003 99990 0.2313 16564784
0.2274 6.0003 109080 0.2312 18071824
0.2317 6.5004 118170 0.2314 19578608
0.2338 7.0004 127260 0.2321 21084064
0.2314 7.5004 136350 0.2312 22590736
0.2355 8.0004 145440 0.2309 24096816
0.2314 8.5005 154530 0.2315 25603904
0.2214 9.0005 163620 0.2316 27109968
0.2294 9.5005 172710 0.2314 28615376

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_1756735778

Adapter
(2098)
this model

Evaluation results