train_multirc_42_1762380766

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1476
  • Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2359 1.0 6130 0.1697 13256608
0.0682 2.0 12260 0.1528 26510112
0.0341 3.0 18390 0.1476 39755376
0.2712 4.0 24520 0.1548 53010912
0.1163 5.0 30650 0.1635 66248576
0.1189 6.0 36780 0.1981 79495984
0.1695 7.0 42910 0.1895 92713360
0.0029 8.0 49040 0.2099 105934480
0.2424 9.0 55170 0.2331 119164864
0.072 10.0 61300 0.2711 132392640
0.305 11.0 67430 0.2960 145641920
0.0007 12.0 73560 0.3520 158902432
0.1829 13.0 79690 0.4011 172144032
0.3587 14.0 85820 0.4523 185378480
0.0144 15.0 91950 0.4646 198621168
0.0776 16.0 98080 0.4896 211855376
0.1005 17.0 104210 0.4996 225105296
0.0762 18.0 110340 0.5208 238352272
0.0002 19.0 116470 0.5091 251594480
0.0002 20.0 122600 0.5096 264840880

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762380766

Adapter
(2098)
this model

Evaluation results