train_cb_1757081471

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3757	0.5044	57	0.4772	16176
0.4219	1.0088	114	0.6849	30400
0.1657	1.5133	171	0.1717	45920
0.3074	2.0177	228	0.3108	62000
0.1047	2.5221	285	0.2146	78288
0.1619	3.0265	342	0.1639	93256
0.1383	3.5310	399	0.1937	109320
0.113	4.0354	456	0.2352	125064
0.0119	4.5398	513	0.1938	140744
0.0008	5.0442	570	0.1800	156584
0.0475	5.5487	627	0.1202	172872
0.0332	6.0531	684	0.1327	188224
0.0006	6.5575	741	0.0924	203264
0.0007	7.0619	798	0.0406	219120
0.0004	7.5664	855	0.0956	234640
0.0004	8.0708	912	0.0689	250080
0.0001	8.5752	969	0.0648	265872
0.0003	9.0796	1026	0.0632	280832
0.0002	9.5841	1083	0.0600	295808

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2099)

this model