train_mmlu_1756729610

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6928
  • Num Input Tokens Seen: 431580728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6976 0.5000 22465 0.6941 21578576
0.635 1.0000 44930 0.6968 43161200
0.1422 1.5000 67395 0.2550 64739536
0.0345 2.0000 89860 0.1957 86326488
0.0213 2.5001 112325 0.2024 107942872
0.0137 3.0001 134790 0.2119 129482000
0.2812 3.5001 157255 0.1778 151041936
0.7063 4.0001 179720 0.7046 172640384
0.7094 4.5001 202185 0.6963 194176752
0.7028 5.0001 224650 0.6953 215777904
0.7204 5.5001 247115 0.6934 237357552
0.6815 6.0001 269580 0.6961 258952232
0.6892 6.5001 292045 0.6956 280508424
0.6907 7.0002 314510 0.6925 302103728
0.6873 7.5002 336975 0.6934 323714608
0.6983 8.0002 359440 0.6941 345271088
0.6993 8.5002 381905 0.6937 366850992
0.7075 9.0002 404370 0.6923 388444744
0.7275 9.5002 426835 0.6925 410013448

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1756729610

Adapter
(2098)
this model

Evaluation results