train_siqa_1756735776

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5492
  • Num Input Tokens Seen: 28646680

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5663 0.5000 7518 0.5535 1434368
0.5459 1.0001 15036 0.5499 2864912
0.6257 1.5001 22554 0.5551 4297616
0.5411 2.0001 30072 0.5522 5729232
0.5326 2.5002 37590 0.5547 7161424
0.5501 3.0002 45108 0.5531 8594304
0.5861 3.5002 52626 0.5572 10025776
0.5616 4.0003 60144 0.5524 11458320
0.572 4.5003 67662 0.5502 12890928
0.5497 5.0003 75180 0.5521 14322784
0.552 5.5004 82698 0.5531 15756256
0.5462 6.0004 90216 0.5507 17189144
0.5767 6.5004 97734 0.5510 18622072
0.559 7.0005 105252 0.5498 20053992
0.5565 7.5005 112770 0.5491 21486200
0.5579 8.0005 120288 0.5505 22918992
0.5769 8.5006 127806 0.5495 24351152
0.528 9.0006 135324 0.5497 25783688
0.5405 9.5006 142842 0.5493 27216936

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_1756735776

Adapter
(2099)
this model

Evaluation results