train_wsc_42_1760451064

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3545
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3815 1.504 188 0.3588 73872
0.5568 3.008 376 0.4372 148192
0.4117 4.5120 564 0.3896 221984
0.3253 6.016 752 0.3690 295616
0.3936 7.52 940 0.3659 370688
0.4325 9.024 1128 0.4053 444448
0.4065 10.528 1316 0.3663 519088
0.3462 12.032 1504 0.3871 592272
0.3683 13.536 1692 0.3882 667952
0.4389 15.04 1880 0.4057 741072
0.4152 16.544 2068 0.3668 815840
0.353 18.048 2256 0.3497 889584
0.3771 19.552 2444 0.3543 964576
0.3377 21.056 2632 0.3533 1038032
0.3419 22.56 2820 0.3698 1112112
0.3335 24.064 3008 0.3605 1186496
0.3445 25.568 3196 0.3572 1261104
0.3473 27.072 3384 0.3560 1336384
0.3622 28.576 3572 0.3553 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760451064

Adapter
(2099)
this model

Evaluation results