train_wsc_42_1760454169

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3567
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4836 1.504 188 0.3577 73872
0.3857 3.008 376 0.3593 148192
0.3526 4.5120 564 0.3545 221984
0.3354 6.016 752 0.3491 295616
0.3437 7.52 940 0.3592 370688
0.3461 9.024 1128 0.3659 444448
0.3484 10.528 1316 0.3567 519088
0.3609 12.032 1504 0.3475 592272
0.3514 13.536 1692 0.3546 667952
0.3174 15.04 1880 0.3536 741072
0.3461 16.544 2068 0.3564 815840
0.3541 18.048 2256 0.3555 889584
0.3347 19.552 2444 0.3568 964576
0.3374 21.056 2632 0.3553 1038032
0.3269 22.56 2820 0.3610 1112112
0.3286 24.064 3008 0.3596 1186496
0.3491 25.568 3196 0.3565 1261104
0.3481 27.072 3384 0.3570 1336384
0.3605 28.576 3572 0.3559 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760454169

Adapter
(2372)
this model