train_mnli_1756729595

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3001
  • Num Input Tokens Seen: 312972112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0087 0.5 88358 0.1107 15656400
0.0796 1.0 176716 0.1006 31302832
0.0168 1.5 265074 0.0900 46945024
0.084 2.0 353432 0.0840 62598968
0.0017 2.5 441790 0.0929 78243624
0.3176 3.0 530148 0.3892 93900752
0.2568 3.5 618506 0.3338 109555344
0.3384 4.0 706864 0.3150 125196704
0.2558 4.5 795222 0.3147 140844896
0.2785 5.0 883580 0.3086 156493064
0.1722 5.5 971938 0.3149 172140360
0.422 6.0 1060296 0.3110 187789496
0.3686 6.5 1148654 0.3132 203440440
0.2015 7.0 1237012 0.3065 219083952
0.2623 7.5 1325370 0.3057 234732208
0.2767 8.0 1413728 0.3012 250382016
0.2832 8.5 1502086 0.3005 266047408
0.2416 9.0 1590444 0.3004 281673536
0.3332 9.5 1678802 0.3003 297311136
0.2742 10.0 1767160 0.3001 312972112

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1756729595

Adapter
(2098)
this model

Evaluation results