train_stsb_101112_1760638039

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4492
  • Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5764 1.0 1294 0.4940 434624
0.3625 2.0 2588 0.4568 869056
0.3437 3.0 3882 0.4492 1304160
0.3981 4.0 5176 0.4617 1740416
0.269 5.0 6470 0.5114 2175568
0.2937 6.0 7764 0.5567 2611168
0.3326 7.0 9058 0.5906 3047200
0.3442 8.0 10352 0.6815 3482720
0.2219 9.0 11646 0.7417 3918416
0.18 10.0 12940 0.8248 4355072
0.1435 11.0 14234 0.9338 4790336
0.1133 12.0 15528 1.1174 5227040
0.0715 13.0 16822 1.2759 5662848
0.0747 14.0 18116 1.4643 6099600
0.0534 15.0 19410 1.6499 6534256
0.0093 16.0 20704 1.8866 6968992
0.0028 17.0 21998 1.9613 7405040
0.0004 18.0 23292 2.0310 7840784
0.0002 19.0 24586 2.0380 8276160
0.0003 20.0 25880 2.0479 8712528

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_101112_1760638039

Adapter
(2098)
this model

Evaluation results