train_cb_123_1757596077

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7344	1.0	113	1.6176	31992
0.2992	2.0	226	0.3345	63544
0.2043	3.0	339	0.2614	95320
0.3552	4.0	452	0.2170	127288
0.0466	5.0	565	0.3203	158768
0.1211	6.0	678	0.2170	190304
0.1292	7.0	791	0.3284	222400
0.0002	8.0	904	0.2181	253488
0.0004	9.0	1017	0.4297	284920
0.0001	10.0	1130	0.3341	316840
0.0	11.0	1243	0.3159	348312
0.0001	12.0	1356	0.3290	379920
0.0	13.0	1469	0.3348	411568
0.0001	14.0	1582	0.3400	443536
0.0	15.0	1695	0.3370	475000
0.0	16.0	1808	0.3393	506880
0.0	17.0	1921	0.3446	538552
0.0	18.0	2034	0.3373	570488
0.0	19.0	2147	0.3423	602176
0.0	20.0	2260	0.3384	633448

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2099)

this model