train_cola_101112_1760638044

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1636
Num Input Tokens Seen: 7325256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2357	1.0	1924	0.2709	366136
0.2299	2.0	3848	0.1864	732880
0.111	3.0	5772	0.1455	1099816
0.1013	4.0	7696	0.1425	1465464
0.1245	5.0	9620	0.1338	1831728
0.1636	6.0	11544	0.1381	2198176
0.1655	7.0	13468	0.1348	2564208
0.035	8.0	15392	0.1460	2930240
0.0486	9.0	17316	0.1341	3297136
0.0396	10.0	19240	0.1487	3663392
0.0693	11.0	21164	0.1524	4028760
0.0187	12.0	23088	0.1645	4394320
0.0341	13.0	25012	0.1672	4761000
0.0522	14.0	26936	0.1779	5127440
0.1024	15.0	28860	0.1908	5494368
0.0535	16.0	30784	0.2188	5860888
0.0826	17.0	32708	0.2318	6226952
0.0033	18.0	34632	0.2388	6593400
0.0023	19.0	36556	0.2486	6959600
0.0222	20.0	38480	0.2490	7325256

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 6

Model tree for rbelanec/train_cola_101112_1760638044

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2098)

this model

rbelanec
/

train_cola_101112_1760638044

train_cola_101112_1760638044

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_101112_1760638044

Evaluation results