train_multirc_1755694495

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4210
  • Num Input Tokens Seen: 117044976

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3106 0.5000 6130 0.3370 5867968
0.2456 1.0001 12260 0.2171 11717296
0.0183 1.5001 18390 0.1600 17588368
0.0191 2.0002 24520 0.1445 23419920
0.0116 2.5002 30650 0.1618 29254768
0.0151 3.0002 36780 0.1502 35127456
0.0071 3.5003 42910 0.1629 40997712
0.0549 4.0003 49040 0.1801 46843184
0.0045 4.5004 55170 0.1625 52692880
0.0028 5.0004 61300 0.1730 58550736
0.0007 5.5004 67430 0.2433 64403536
0.0019 6.0005 73560 0.2333 70253968
0.1469 6.5005 79690 0.2315 76098208
0.004 7.0006 85820 0.2299 81942784
0.0005 7.5006 91950 0.2834 87796256
0.0006 8.0007 98080 0.2994 93652576
0.0 8.5007 104210 0.3707 99519344
0.0 9.0007 110340 0.3787 105351168
0.0 9.5008 116470 0.4142 111213008

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_1755694495

Adapter
(2099)
this model

Evaluation results