train_cb_1756729050

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6315	0.5044	57	0.4110	17136
0.4932	1.0088	114	0.8902	32376
0.3213	1.5133	171	0.8148	48728
0.429	2.0177	228	0.2493	64040
0.1579	2.5221	285	0.2036	79784
0.0141	3.0265	342	0.2938	96200
0.1094	3.5310	399	0.2391	112440
0.2226	4.0354	456	0.2171	128712
0.0679	4.5398	513	0.3177	143944
0.286	5.0442	570	0.2677	160016
0.0158	5.5487	627	0.3665	176688
0.027	6.0531	684	0.2993	192272
0.0012	6.5575	741	0.3299	208944
0.0001	7.0619	798	0.2633	224288
0.0017	7.5664	855	0.2684	239840
0.0001	8.0708	912	0.2846	255984
0.0008	8.5752	969	0.2800	272064
0.0003	9.0796	1026	0.2731	287928
0.0001	9.5841	1083	0.2796	303800

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2099)

this model