train_stsb_101112_1760638041

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4620
Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.3293	1.0	1294	1.0926	434624
0.5139	2.0	2588	0.6458	869056
0.4608	3.0	3882	0.5673	1304160
0.584	4.0	5176	0.5370	1740416
0.3868	5.0	6470	0.5146	2175568
0.4999	6.0	7764	0.5019	2611168
0.4161	7.0	9058	0.4916	3047200
0.522	8.0	10352	0.4850	3482720
0.464	9.0	11646	0.4782	3918416
0.5449	10.0	12940	0.4738	4355072
0.62	11.0	14234	0.4719	4790336
0.5518	12.0	15528	0.4678	5227040
0.4453	13.0	16822	0.4661	5662848
0.4771	14.0	18116	0.4656	6099600
0.5392	15.0	19410	0.4644	6534256
0.4023	16.0	20704	0.4643	6968992
0.3908	17.0	21998	0.4630	7405040
0.5031	18.0	23292	0.4620	7840784
0.3964	19.0	24586	0.4628	8276160
0.5761	20.0	25880	0.4631	8712528

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 6

Model tree for rbelanec/train_stsb_101112_1760638041

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2098)

this model

rbelanec
/

train_stsb_101112_1760638041

train_stsb_101112_1760638041

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_stsb_101112_1760638041

Evaluation results