train_cb_1755694499

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5865	0.5044	57	0.5033	17136
0.5224	1.0088	114	1.1373	32376
0.2822	1.5133	171	0.7325	48728
0.2233	2.0177	228	0.2152	64040
0.2704	2.5221	285	0.1964	79784
0.0203	3.0265	342	0.2481	96200
0.137	3.5310	399	0.2158	112440
0.2143	4.0354	456	0.2686	128712
0.0541	4.5398	513	0.5688	143944
0.2174	5.0442	570	0.2900	160016
0.0539	5.5487	627	0.4841	176688
0.0375	6.0531	684	0.2865	192272
0.0032	6.5575	741	0.3885	208944
0.0002	7.0619	798	0.4208	224288
0.0031	7.5664	855	0.4617	239840
0.0002	8.0708	912	0.4403	255984
0.0031	8.5752	969	0.4409	272064
0.0013	9.0796	1026	0.4438	287928
0.0007	9.5841	1083	0.4399	303800

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2099)

this model