train_siqa_1755694504

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5506
  • Num Input Tokens Seen: 28646680

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5636 0.5000 7518 0.5540 1434368
0.542 1.0001 15036 0.5497 2864912
0.6198 1.5001 22554 0.5551 4297616
0.5434 2.0001 30072 0.5521 5729232
0.529 2.5002 37590 0.5554 7161424
0.5494 3.0002 45108 0.5522 8594304
0.59 3.5002 52626 0.5577 10025776
0.5675 4.0003 60144 0.5522 11458320
0.5674 4.5003 67662 0.5504 12890928
0.5455 5.0003 75180 0.5519 14322784
0.5454 5.5004 82698 0.5526 15756256
0.5474 6.0004 90216 0.5505 17189144
0.5716 6.5004 97734 0.5511 18622072
0.5484 7.0005 105252 0.5505 20053992
0.5424 7.5005 112770 0.5502 21486200
0.5734 8.0005 120288 0.5509 22918992
0.5587 8.5006 127806 0.5504 24351152
0.5356 9.0006 135324 0.5506 25783688
0.5627 9.5006 142842 0.5502 27216936

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_1755694504

Adapter
(2099)
this model

Evaluation results