train_cb_1757081469

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8643	0.5044	57	0.4547	15664
0.2715	1.0088	114	0.5620	31304
0.3501	1.5133	171	0.2233	48008
0.5262	2.0177	228	0.3286	63040
0.2992	2.5221	285	0.1377	79104
0.0962	3.0265	342	0.1584	93888
0.0753	3.5310	399	0.3117	109952
0.4213	4.0354	456	0.1032	125688
0.0041	4.5398	513	0.1092	141000
0.2002	5.0442	570	0.1037	156920
0.0287	5.5487	627	0.0836	172504
0.0005	6.0531	684	0.0518	188160
0.0071	6.5575	741	0.0129	204240
0.0039	7.0619	798	0.0223	219512
0.0002	7.5664	855	0.0217	235256
0.0007	8.0708	912	0.0131	251392
0.0022	8.5752	969	0.0146	266784
0.0002	9.0796	1026	0.0154	281976
0.0006	9.5841	1083	0.0161	297688

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2098)

this model