train_wsc_42_1760453463

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3538
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3749 1.504 188 0.3537 73872
0.39 3.008 376 0.3577 148192
0.3703 4.5120 564 0.3551 221984
0.3378 6.016 752 0.3775 295616
0.3765 7.52 940 0.3587 370688
0.3572 9.024 1128 0.3560 444448
0.3868 10.528 1316 0.3555 519088
0.3247 12.032 1504 0.3783 592272
0.377 13.536 1692 0.3674 667952
0.5246 15.04 1880 0.3576 741072
0.3752 16.544 2068 0.3610 815840
0.3502 18.048 2256 0.3510 889584
0.3421 19.552 2444 0.3494 964576
0.3438 21.056 2632 0.3575 1038032
0.3366 22.56 2820 0.3638 1112112
0.3274 24.064 3008 0.3541 1186496
0.3426 25.568 3196 0.3563 1261104
0.3457 27.072 3384 0.3544 1336384
0.3632 28.576 3572 0.3554 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760453463

Adapter
(2099)
this model

Evaluation results