train_sst2_101112_1760638075

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0537
Num Input Tokens Seen: 67752208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3488	1.0	15154	0.3445	3387376
0.315	2.0	30308	0.3340	6776272
0.0665	3.0	45462	0.0605	10162816
0.013	4.0	60616	0.0600	13553408
0.0098	5.0	75770	0.0664	16938528
0.0109	6.0	90924	0.0555	20326208
0.112	7.0	106078	0.0569	23713072
0.0446	8.0	121232	0.0555	27100096
0.0556	9.0	136386	0.0539	30487024
0.0969	10.0	151540	0.0571	33871648
0.0179	11.0	166694	0.0537	37260608
0.0146	12.0	181848	0.0563	40646320
0.0075	13.0	197002	0.0552	44035072
0.1066	14.0	212156	0.0557	47423936
0.0913	15.0	227310	0.0578	50810512
0.0466	16.0	242464	0.0558	54201824
0.0419	17.0	257618	0.0563	57588768
0.0216	18.0	272772	0.0564	60979040
0.0134	19.0	287926	0.0563	64364064
0.1048	20.0	303080	0.0562	67752208

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_sst2_101112_1760638075

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2099)

this model

rbelanec
/

train_sst2_101112_1760638075

train_sst2_101112_1760638075

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_sst2_101112_1760638075

Evaluation results