train_stsb_101112_1760638041

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4620
  • Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.3293 1.0 1294 1.0926 434624
0.5139 2.0 2588 0.6458 869056
0.4608 3.0 3882 0.5673 1304160
0.584 4.0 5176 0.5370 1740416
0.3868 5.0 6470 0.5146 2175568
0.4999 6.0 7764 0.5019 2611168
0.4161 7.0 9058 0.4916 3047200
0.522 8.0 10352 0.4850 3482720
0.464 9.0 11646 0.4782 3918416
0.5449 10.0 12940 0.4738 4355072
0.62 11.0 14234 0.4719 4790336
0.5518 12.0 15528 0.4678 5227040
0.4453 13.0 16822 0.4661 5662848
0.4771 14.0 18116 0.4656 6099600
0.5392 15.0 19410 0.4644 6534256
0.4023 16.0 20704 0.4643 6968992
0.3908 17.0 21998 0.4630 7405040
0.5031 18.0 23292 0.4620 7840784
0.3964 19.0 24586 0.4628 8276160
0.5761 20.0 25880 0.4631 8712528

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_101112_1760638041

Adapter
(2098)
this model

Evaluation results