rbelanec's picture
End of training
7032288 verified
metadata
library_name: peft
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
  - llama-factory
  - prefix-tuning
  - generated_from_trainer
model-index:
  - name: train_wsc_42_1760609151
    results: []

train_wsc_42_1760609151

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3522
  • Num Input Tokens Seen: 1308280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.414 1.5045 167 0.4660 65984
0.4193 3.0090 334 0.4927 131096
0.3295 4.5135 501 0.3682 196400
0.4134 6.0180 668 0.3488 261392
0.3588 7.5225 835 0.3535 326864
0.3683 9.0270 1002 0.3525 391800
0.3652 10.5315 1169 0.3491 458568
0.3456 12.0360 1336 0.3909 523312
0.3703 13.5405 1503 0.3534 589824
0.3643 15.0450 1670 0.3588 655200
0.3647 16.5495 1837 0.3570 721016
0.3484 18.0541 2004 0.3501 787016
0.3594 19.5586 2171 0.3501 853744
0.3425 21.0631 2338 0.3509 918752
0.3605 22.5676 2505 0.3516 984472
0.3433 24.0721 2672 0.3534 1050088
0.3693 25.5766 2839 0.3555 1115632
0.3353 27.0811 3006 0.3503 1181344
0.3507 28.5856 3173 0.3512 1246872

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1