train_wsc_101112_1760446104

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4362
  • Num Input Tokens Seen: 1471184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3629 1.504 188 0.4066 74288
0.3344 3.008 376 0.4253 147040
0.3523 4.5120 564 0.3512 221408
0.3645 6.016 752 0.3525 294736
0.3869 7.52 940 0.3689 368400
0.3533 9.024 1128 0.3554 441968
0.3493 10.528 1316 0.3552 514960
0.3368 12.032 1504 0.3549 588032
0.3421 13.536 1692 0.3556 662784
0.3505 15.04 1880 0.3548 735760
0.3536 16.544 2068 0.3628 809088
0.3419 18.048 2256 0.3575 883568
0.3534 19.552 2444 0.3595 958720
0.3313 21.056 2632 0.3684 1031776
0.3376 22.56 2820 0.3696 1105632
0.3543 24.064 3008 0.3914 1179856
0.3551 25.568 3196 0.4112 1253280
0.328 27.072 3384 0.4254 1327824
0.2946 28.576 3572 0.4310 1400944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760446104

Adapter
(2099)
this model

Evaluation results