train_cb_101112_1757596151

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5565	1.0	113	0.3490	30240
0.6263	2.0	226	0.3298	61600
0.4408	3.0	339	0.1773	92552
0.3915	4.0	452	0.2358	123976
0.0154	5.0	565	0.2813	155224
0.1362	6.0	678	0.1831	186368
0.0329	7.0	791	0.1248	217280
0.0004	8.0	904	0.0106	248064
0.0001	9.0	1017	0.1456	278576
0.0001	10.0	1130	0.1819	309584
0.0001	11.0	1243	0.2099	340752
0.0	12.0	1356	0.1466	372240
0.0001	13.0	1469	0.1362	402976
0.0001	14.0	1582	0.1331	433800
0.0001	15.0	1695	0.1305	465096
0.0001	16.0	1808	0.1263	496184
0.0	17.0	1921	0.1279	527400
0.0	18.0	2034	0.1253	558656
0.0	19.0	2147	0.1310	589928
0.0	20.0	2260	0.1226	621040

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2098)

this model