train_wsc_42_1763630699
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:
- Loss: 0.3772
- Num Input Tokens Seen: 439936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.4025 | 0.5020 | 125 | 0.4167 | 21680 |
| 0.5404 | 1.0040 | 250 | 0.3906 | 44352 |
| 0.404 | 1.5060 | 375 | 0.3863 | 66144 |
| 0.3617 | 2.0080 | 500 | 0.3810 | 88400 |
| 0.3566 | 2.5100 | 625 | 0.3816 | 110080 |
| 0.3774 | 3.0120 | 750 | 0.3974 | 132848 |
| 0.3646 | 3.5141 | 875 | 0.3821 | 155136 |
| 0.3182 | 4.0161 | 1000 | 0.3772 | 176880 |
| 0.3308 | 4.5181 | 1125 | 0.3854 | 198976 |
| 0.3538 | 5.0201 | 1250 | 0.3810 | 220656 |
| 0.3482 | 5.5221 | 1375 | 0.3917 | 242336 |
| 0.3292 | 6.0241 | 1500 | 0.3838 | 264736 |
| 0.3629 | 6.5261 | 1625 | 0.3985 | 286816 |
| 0.3227 | 7.0281 | 1750 | 0.3992 | 309056 |
| 0.3125 | 7.5301 | 1875 | 0.4022 | 331424 |
| 0.2885 | 8.0321 | 2000 | 0.4007 | 353664 |
| 0.2861 | 8.5341 | 2125 | 0.4039 | 375328 |
| 0.3694 | 9.0361 | 2250 | 0.4033 | 397584 |
| 0.3345 | 9.5382 | 2375 | 0.4031 | 419584 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.9.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 1
Model tree for rbelanec/train_wsc_42_1763630699
Base model
meta-llama/Meta-Llama-3-8B-Instruct