train_cb_1757340239

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.175	1.0	113	0.9233	30520
0.1677	2.0	226	0.4181	61312
0.5974	3.0	339	0.2543	92192
0.2458	4.0	452	0.3391	122752
0.1608	5.0	565	0.2920	153112
0.0079	6.0	678	0.4712	183568
0.0962	7.0	791	0.4413	214352
0.0015	8.0	904	0.3483	245208
0.0001	9.0	1017	0.4570	275632
0.0002	10.0	1130	0.4189	306152
0.0	11.0	1243	0.4947	336688
0.0	12.0	1356	0.4761	367392
0.0	13.0	1469	0.4681	398224
0.0	14.0	1582	0.4658	428448
0.0	15.0	1695	0.4718	459320
0.0	16.0	1808	0.4760	489768
0.0	17.0	1921	0.4684	520440
0.0	18.0	2034	0.4734	551464
0.0	19.0	2147	0.4698	582408
0.0	20.0	2260	0.4669	612968

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2108)

this model