train_wsc_123_1760367295

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 7.2257
  • Num Input Tokens Seen: 1465808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3676 1.504 188 0.3917 73760
0.3837 3.008 376 0.3505 148032
0.366 4.5120 564 0.3529 222944
0.335 6.016 752 0.3569 294320
0.3444 7.52 940 0.3624 369248
0.3306 9.024 1128 0.3562 442000
0.353 10.528 1316 0.3476 516624
0.3738 12.032 1504 0.3542 589072
0.3864 13.536 1692 0.3649 662256
0.3253 15.04 1880 0.3564 736272
0.348 16.544 2068 0.3542 809824
0.3507 18.048 2256 0.3494 882480
0.3609 19.552 2444 0.3476 956000
0.3454 21.056 2632 0.3523 1028736
0.3374 22.56 2820 0.3517 1102672
0.3512 24.064 3008 0.3494 1176448
0.3621 25.568 3196 0.3530 1249968
0.3364 27.072 3384 0.3540 1322608
0.3485 28.576 3572 0.3533 1396032

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760367295

Adapter
(2124)
this model

Evaluation results