train_stsb_101112_1760638039

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4492
Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5764	1.0	1294	0.4940	434624
0.3625	2.0	2588	0.4568	869056
0.3437	3.0	3882	0.4492	1304160
0.3981	4.0	5176	0.4617	1740416
0.269	5.0	6470	0.5114	2175568
0.2937	6.0	7764	0.5567	2611168
0.3326	7.0	9058	0.5906	3047200
0.3442	8.0	10352	0.6815	3482720
0.2219	9.0	11646	0.7417	3918416
0.18	10.0	12940	0.8248	4355072
0.1435	11.0	14234	0.9338	4790336
0.1133	12.0	15528	1.1174	5227040
0.0715	13.0	16822	1.2759	5662848
0.0747	14.0	18116	1.4643	6099600
0.0534	15.0	19410	1.6499	6534256
0.0093	16.0	20704	1.8866	6968992
0.0028	17.0	21998	1.9613	7405040
0.0004	18.0	23292	2.0310	7840784
0.0002	19.0	24586	2.0380	8276160
0.0003	20.0	25880	2.0479	8712528

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 6

Model tree for rbelanec/train_stsb_101112_1760638039

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2098)

this model

rbelanec
/

train_stsb_101112_1760638039

train_stsb_101112_1760638039

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_stsb_101112_1760638039

Evaluation results