train_cb_1757340239

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4669
  • Num Input Tokens Seen: 612968

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.175 1.0 113 0.9233 30520
0.1677 2.0 226 0.4181 61312
0.5974 3.0 339 0.2543 92192
0.2458 4.0 452 0.3391 122752
0.1608 5.0 565 0.2920 153112
0.0079 6.0 678 0.4712 183568
0.0962 7.0 791 0.4413 214352
0.0015 8.0 904 0.3483 245208
0.0001 9.0 1017 0.4570 275632
0.0002 10.0 1130 0.4189 306152
0.0 11.0 1243 0.4947 336688
0.0 12.0 1356 0.4761 367392
0.0 13.0 1469 0.4681 398224
0.0 14.0 1582 0.4658 428448
0.0 15.0 1695 0.4718 459320
0.0 16.0 1808 0.4760 489768
0.0 17.0 1921 0.4684 520440
0.0 18.0 2034 0.4734 551464
0.0 19.0 2147 0.4698 582408
0.0 20.0 2260 0.4669 612968

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_1757340239

Adapter
(2108)
this model

Evaluation results