train_multirc_42_1762240404

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2360
  • Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1799 1.0 6130 0.1399 13256608
0.0262 2.0 12260 0.1308 26510112
0.0511 3.0 18390 0.1234 39755376
0.2173 4.0 24520 0.1261 53010912
0.1424 5.0 30650 0.1210 66248576
0.0721 6.0 36780 0.1252 79495984
0.1325 7.0 42910 0.1301 92713360
0.0358 8.0 49040 0.1273 105934480
0.2284 9.0 55170 0.1313 119164864
0.0154 10.0 61300 0.1411 132392640
0.2095 11.0 67430 0.1629 145641920
0.0057 12.0 73560 0.1854 158902432
0.0084 13.0 79690 0.1977 172144032
0.2741 14.0 85820 0.2317 185378480
0.1494 15.0 91950 0.2368 198621168
0.0007 16.0 98080 0.2623 211855376
0.0111 17.0 104210 0.2967 225105296
0.0617 18.0 110340 0.3215 238352272
0.0019 19.0 116470 0.3441 251594480
0.0014 20.0 122600 0.3499 264840880

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762240404

Adapter
(2098)
this model

Evaluation results