train_wsc_123_1760444422

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3594
  • Num Input Tokens Seen: 1465808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3622 1.504 188 0.4002 73760
0.3775 3.008 376 0.3543 148032
0.3665 4.5120 564 0.3503 222944
0.3633 6.016 752 0.3543 294320
0.349 7.52 940 0.3654 369248
0.3345 9.024 1128 0.3515 442000
0.3566 10.528 1316 0.3489 516624
0.3802 12.032 1504 0.3592 589072
0.3792 13.536 1692 0.3687 662256
0.3284 15.04 1880 0.3571 736272
0.3517 16.544 2068 0.3599 809824
0.3555 18.048 2256 0.3476 882480
0.3578 19.552 2444 0.3531 956000
0.3436 21.056 2632 0.3548 1028736
0.3316 22.56 2820 0.3564 1102672
0.3459 24.064 3008 0.3546 1176448
0.3677 25.568 3196 0.3557 1249968
0.3249 27.072 3384 0.3585 1322608
0.3561 28.576 3572 0.3575 1396032

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760444422

Adapter
(2373)
this model