train_hellaswag_1756735777

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4628
  • Num Input Tokens Seen: 99399984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4774 0.5000 8979 0.4642 4965184
0.4787 1.0001 17958 0.4646 9947520
0.4705 1.5001 26937 0.4629 14913744
0.466 2.0001 35916 0.4634 19885840
0.465 2.5001 44895 0.4631 24848464
0.4722 3.0002 53874 0.4633 29830256
0.4608 3.5002 62853 0.4627 34790608
0.4643 4.0002 71832 0.4635 39759792
0.4598 4.5003 80811 0.4630 44726624
0.4527 5.0003 89790 0.4630 49707056
0.4629 5.5003 98769 0.4627 54679744
0.4604 6.0003 107748 0.4635 59650096
0.4588 6.5004 116727 0.4627 64626528
0.4583 7.0004 125706 0.4633 69601680
0.4669 7.5004 134685 0.4626 74575408
0.4603 8.0004 143664 0.4629 79549952
0.4694 8.5005 152643 0.4631 84520688
0.4613 9.0005 161622 0.4630 89486320
0.4569 9.5005 170601 0.4630 94447840

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_1756735777

Adapter
(2098)
this model

Evaluation results