train_cb_1757081470

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3315	0.5044	57	0.2731	15568
0.1533	1.0088	114	1.0623	30760
0.2727	1.5133	171	0.5874	46120
0.3404	2.0177	228	0.2549	61792
0.6691	2.5221	285	0.3684	77136
0.4382	3.0265	342	0.2316	92944
0.3287	3.5310	399	0.4520	108704
0.3698	4.0354	456	0.2106	123744
0.3123	4.5398	513	0.2462	139232
0.0127	5.0442	570	0.2971	154632
0.1652	5.5487	627	0.4199	169832
0.0145	6.0531	684	0.4591	185424
0.0089	6.5575	741	0.5756	201168
0.0062	7.0619	798	0.4575	216400
0.0005	7.5664	855	0.4710	231792
0.0008	8.0708	912	0.4744	247656
0.0025	8.5752	969	0.4987	263304
0.0004	9.0796	1026	0.4878	278160
0.0007	9.5841	1083	0.4866	293584

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2098)

this model