train_wsc_42_1760465270

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4535
  • Num Input Tokens Seen: 1468632

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3738 3.0 168 0.3511 146552
0.3553 6.0 336 0.3657 294256
0.3555 9.0 504 0.3502 439768
0.3611 12.0 672 0.3622 586448
0.3454 15.0 840 0.3614 735680
0.3497 18.0 1008 0.3600 882920
0.3609 21.0 1176 0.3777 1029792
0.3353 24.0 1344 0.4059 1176968
0.3337 27.0 1512 0.4415 1321408
0.3069 30.0 1680 0.4535 1468632

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760465270

Adapter
(2098)
this model

Evaluation results