train_wsc_42_1763630699

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3772
  • Num Input Tokens Seen: 439936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4025 0.5020 125 0.4167 21680
0.5404 1.0040 250 0.3906 44352
0.404 1.5060 375 0.3863 66144
0.3617 2.0080 500 0.3810 88400
0.3566 2.5100 625 0.3816 110080
0.3774 3.0120 750 0.3974 132848
0.3646 3.5141 875 0.3821 155136
0.3182 4.0161 1000 0.3772 176880
0.3308 4.5181 1125 0.3854 198976
0.3538 5.0201 1250 0.3810 220656
0.3482 5.5221 1375 0.3917 242336
0.3292 6.0241 1500 0.3838 264736
0.3629 6.5261 1625 0.3985 286816
0.3227 7.0281 1750 0.3992 309056
0.3125 7.5301 1875 0.4022 331424
0.2885 8.0321 2000 0.4007 353664
0.2861 8.5341 2125 0.4039 375328
0.3694 9.0361 2250 0.4033 397584
0.3345 9.5382 2375 0.4031 419584

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1763630699

Adapter
(2098)
this model

Evaluation results