train_cb_1756729608

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6373	0.5044	57	0.4306	17136
0.3965	1.0088	114	0.9618	32376
0.3048	1.5133	171	0.7284	48728
0.2021	2.0177	228	0.2919	64040
0.4291	2.5221	285	0.8386	79784
0.0067	3.0265	342	0.3568	96200
0.15	3.5310	399	0.3280	112440
0.1948	4.0354	456	0.2427	128712
0.1894	4.5398	513	0.4788	143944
0.2664	5.0442	570	0.3219	160016
0.0356	5.5487	627	0.4499	176688
0.0116	6.0531	684	0.2823	192272
0.0005	6.5575	741	0.4588	208944
0.0509	7.0619	798	0.3445	224288
0.0031	7.5664	855	0.4802	239840
0.0002	8.0708	912	0.4264	255984
0.0014	8.5752	969	0.3699	272064
0.0057	9.0796	1026	0.4110	287928
0.0004	9.5841	1083	0.3927	303800

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2099)

this model