train_wsc_42_1760609151

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3522
  • Num Input Tokens Seen: 1308280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.414 1.5045 167 0.4660 65984
0.4193 3.0090 334 0.4927 131096
0.3295 4.5135 501 0.3682 196400
0.4134 6.0180 668 0.3488 261392
0.3588 7.5225 835 0.3535 326864
0.3683 9.0270 1002 0.3525 391800
0.3652 10.5315 1169 0.3491 458568
0.3456 12.0360 1336 0.3909 523312
0.3703 13.5405 1503 0.3534 589824
0.3643 15.0450 1670 0.3588 655200
0.3647 16.5495 1837 0.3570 721016
0.3484 18.0541 2004 0.3501 787016
0.3594 19.5586 2171 0.3501 853744
0.3425 21.0631 2338 0.3509 918752
0.3605 22.5676 2505 0.3516 984472
0.3433 24.0721 2672 0.3534 1050088
0.3693 25.5766 2839 0.3555 1115632
0.3353 27.0811 3006 0.3503 1181344
0.3507 28.5856 3173 0.3512 1246872

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760609151

Adapter
(2098)
this model

Evaluation results