train_cb_123_1757596077

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3384
  • Num Input Tokens Seen: 633448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.7344 1.0 113 1.6176 31992
0.2992 2.0 226 0.3345 63544
0.2043 3.0 339 0.2614 95320
0.3552 4.0 452 0.2170 127288
0.0466 5.0 565 0.3203 158768
0.1211 6.0 678 0.2170 190304
0.1292 7.0 791 0.3284 222400
0.0002 8.0 904 0.2181 253488
0.0004 9.0 1017 0.4297 284920
0.0001 10.0 1130 0.3341 316840
0.0 11.0 1243 0.3159 348312
0.0001 12.0 1356 0.3290 379920
0.0 13.0 1469 0.3348 411568
0.0001 14.0 1582 0.3400 443536
0.0 15.0 1695 0.3370 475000
0.0 16.0 1808 0.3393 506880
0.0 17.0 1921 0.3446 538552
0.0 18.0 2034 0.3373 570488
0.0 19.0 2147 0.3423 602176
0.0 20.0 2260 0.3384 633448

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1757596077

Adapter
(2099)
this model

Evaluation results