train_wic_101112_1760638030

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8323
  • Num Input Tokens Seen: 7502512

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.349 2.0 2172 0.3450 750592
0.2779 4.0 4344 0.3794 1500800
0.305 6.0 6516 0.3511 2251232
0.3995 8.0 8688 0.3773 3001072
0.2226 10.0 10860 0.4986 3751408
0.2789 12.0 13032 0.8729 4501296
0.2892 14.0 15204 1.2637 5252112
0.0002 16.0 17376 1.7115 6002224
0.0001 18.0 19548 1.8162 6752432
0.0 20.0 21720 1.8323 7502512

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_101112_1760638030

Adapter
(2099)
this model

Evaluation results