train_cb_1757340190

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4348	1.0	113	0.6303	31992
0.5809	2.0	226	0.2471	63544
0.2713	3.0	339	0.2753	95320
0.5372	4.0	452	0.2236	127288
0.1443	5.0	565	0.3391	158768
0.161	6.0	678	0.1972	190304
0.1019	7.0	791	0.2021	222400
0.0051	8.0	904	0.3096	253488
0.0223	9.0	1017	0.3006	284920
0.0903	10.0	1130	0.3369	316840
0.0001	11.0	1243	0.3012	348312
0.0	12.0	1356	0.3449	379920
0.0001	13.0	1469	0.3429	411568
0.0001	14.0	1582	0.3481	443536
0.0	15.0	1695	0.3421	475000
0.0	16.0	1808	0.3372	506880
0.0	17.0	1921	0.3425	538552
0.0	18.0	2034	0.3449	570488
0.0	19.0	2147	0.3529	602176
0.0	20.0	2260	0.3424	633448

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2098)

this model