train_wsc_42_1760466772

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3524
  • Num Input Tokens Seen: 1468632

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.15
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4818 1.5 42 0.3624 73824
0.3847 3.0 84 0.3849 146552
0.3464 4.5 126 0.3481 221264
0.3578 6.0 168 0.3743 294256
0.3492 7.5 210 0.3458 368144
0.3542 9.0 252 0.3495 439768
0.34 10.5 294 0.3504 514888
0.3578 12.0 336 0.3650 586448
0.3462 13.5 378 0.3538 662016
0.3506 15.0 420 0.3557 735680
0.3489 16.5 462 0.3519 810232
0.3528 18.0 504 0.3558 882920
0.3408 19.5 546 0.3517 957488
0.3469 21.0 588 0.3542 1029792
0.3488 22.5 630 0.3554 1103160
0.3402 24.0 672 0.3529 1176968
0.3422 25.5 714 0.3558 1250064
0.3474 27.0 756 0.3538 1321408
0.3409 28.5 798 0.3524 1394904
0.3344 30.0 840 0.3524 1468632

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760466772

Adapter
(2098)
this model

Evaluation results