train_mmlu_1754507480

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1797
  • Num Input Tokens Seen: 488118104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1076 0.5000 11233 0.2351 24389728
0.0414 1.0000 22466 0.2098 48789280
0.199 1.5001 33699 0.1873 73201984
0.0773 2.0001 44932 0.1797 97620120
0.0596 2.5001 56165 0.1921 122127480
0.0945 3.0001 67398 0.1902 146471872
0.1151 3.5002 78631 0.1892 170850208
0.1283 4.0002 89864 0.1858 195267312
0.0177 4.5002 101097 0.2039 219639056
0.0969 5.0002 112330 0.2092 244095744
0.1111 5.5002 123563 0.2263 268478944
0.1168 6.0003 134796 0.2209 292933144
0.0028 6.5003 146029 0.2462 317335480
0.0782 7.0003 157262 0.2456 341742832
0.284 7.5003 168495 0.2603 366182192
0.0982 8.0004 179728 0.2601 390549264
0.0024 8.5004 190961 0.2578 414924016
0.1062 9.0004 202194 0.2663 439335448
0.1032 9.5004 213427 0.2686 463693944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1754507480

Adapter
(2369)
this model