train_wsc_42_1763998309
This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:
- Loss: 0.3510
- Num Input Tokens Seen: 439936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.37 | 0.5020 | 125 | 0.3767 | 21680 |
| 0.3635 | 1.0040 | 250 | 0.3510 | 44352 |
| 0.3622 | 1.5060 | 375 | 0.3543 | 66144 |
| 0.3709 | 2.0080 | 500 | 0.3519 | 88400 |
| 0.329 | 2.5100 | 625 | 0.3622 | 110080 |
| 0.3834 | 3.0120 | 750 | 0.3635 | 132848 |
| 0.3667 | 3.5141 | 875 | 0.3585 | 155136 |
| 0.3157 | 4.0161 | 1000 | 0.3602 | 176880 |
| 0.3154 | 4.5181 | 1125 | 0.3550 | 198976 |
| 0.3388 | 5.0201 | 1250 | 0.3535 | 220656 |
| 0.355 | 5.5221 | 1375 | 0.3561 | 242336 |
| 0.34 | 6.0241 | 1500 | 0.3593 | 264736 |
| 0.3583 | 6.5261 | 1625 | 0.3623 | 286816 |
| 0.3206 | 7.0281 | 1750 | 0.3571 | 309056 |
| 0.3306 | 7.5301 | 1875 | 0.3616 | 331424 |
| 0.2795 | 8.0321 | 2000 | 0.3605 | 353664 |
| 0.3327 | 8.5341 | 2125 | 0.3567 | 375328 |
| 0.3406 | 9.0361 | 2250 | 0.3597 | 397584 |
| 0.3351 | 9.5382 | 2375 | 0.3600 | 419584 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.9.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- -
Model tree for rbelanec/train_wsc_42_1763998309
Base model
meta-llama/Llama-3.2-1B-Instruct