train_stsb_101112_1760638036

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8620
  • Num Input Tokens Seen: 7733736

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6426 2.0 2300 0.7783 773864
0.5803 4.0 4600 0.6503 1549984
0.4027 6.0 6900 0.5849 2323056
0.4471 8.0 9200 0.5755 3095688
0.4379 10.0 11500 0.5556 3868984
0.3689 12.0 13800 0.6094 4639416
0.4081 14.0 16100 0.6797 5412280
0.2179 16.0 18400 0.7459 6187248
0.3027 18.0 20700 0.8429 6961472
0.2452 20.0 23000 0.8620 7733736

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_101112_1760638036

Adapter
(2098)
this model

Evaluation results