train_cola_101112_1760638043

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2553
Num Input Tokens Seen: 7325256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2403	1.0	1924	0.2656	366136
0.2883	2.0	3848	0.2799	732880
0.2658	3.0	5772	0.2624	1099816
0.2307	4.0	7696	0.2575	1465464
0.2962	5.0	9620	0.2564	1831728
0.2647	6.0	11544	0.2573	2198176
0.2755	7.0	13468	0.2562	2564208
0.265	8.0	15392	0.2556	2930240
0.2766	9.0	17316	0.2560	3297136
0.2734	10.0	19240	0.2588	3663392
0.2854	11.0	21164	0.2556	4028760
0.2402	12.0	23088	0.2594	4394320
0.2395	13.0	25012	0.2561	4761000
0.2578	14.0	26936	0.2555	5127440
0.2534	15.0	28860	0.2555	5494368
0.2798	16.0	30784	0.2557	5860888
0.241	17.0	32708	0.2553	6226952
0.3141	18.0	34632	0.2559	6593400
0.2296	19.0	36556	0.2556	6959600
0.2954	20.0	38480	0.2562	7325256

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 7

Model tree for rbelanec/train_cola_101112_1760638043

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2098)

this model

rbelanec
/

train_cola_101112_1760638043

train_cola_101112_1760638043

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_101112_1760638043

Evaluation results