train_stsb_101112_1760638038

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3077
  • Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.7815 1.0 1294 0.6249 434624
0.4025 2.0 2588 0.5137 869056
0.3955 3.0 3882 0.4564 1304160
0.4456 4.0 5176 0.4475 1740416
0.3391 5.0 6470 0.4580 2175568
0.4272 6.0 7764 0.4402 2611168
0.3837 7.0 9058 0.4400 3047200
0.4752 8.0 10352 0.4363 3482720
0.3787 9.0 11646 0.4365 3918416
0.4897 10.0 12940 0.4465 4355072
0.4996 11.0 14234 0.4438 4790336
0.3837 12.0 15528 0.4662 5227040
0.3313 13.0 16822 0.4599 5662848
0.3999 14.0 18116 0.4828 6099600
0.2922 15.0 19410 0.4877 6534256
0.2866 16.0 20704 0.5080 6968992
0.2577 17.0 21998 0.5373 7405040
0.2791 18.0 23292 0.5483 7840784
0.2941 19.0 24586 0.5602 8276160
0.4091 20.0 25880 0.5618 8712528

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_101112_1760638038

Adapter
(2098)
this model

Evaluation results