train_wsc_42_1760466772

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3524
Num Input Tokens Seen: 1468632

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.15
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4818	1.5	42	0.3624	73824
0.3847	3.0	84	0.3849	146552
0.3464	4.5	126	0.3481	221264
0.3578	6.0	168	0.3743	294256
0.3492	7.5	210	0.3458	368144
0.3542	9.0	252	0.3495	439768
0.34	10.5	294	0.3504	514888
0.3578	12.0	336	0.3650	586448
0.3462	13.5	378	0.3538	662016
0.3506	15.0	420	0.3557	735680
0.3489	16.5	462	0.3519	810232
0.3528	18.0	504	0.3558	882920
0.3408	19.5	546	0.3517	957488
0.3469	21.0	588	0.3542	1029792
0.3488	22.5	630	0.3554	1103160
0.3402	24.0	672	0.3529	1176968
0.3422	25.5	714	0.3558	1250064
0.3474	27.0	756	0.3538	1321408
0.3409	28.5	798	0.3524	1394904
0.3344	30.0	840	0.3524	1468632

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760466772

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2098)

this model

rbelanec
/

train_wsc_42_1760466772

train_wsc_42_1760466772

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wsc_42_1760466772

Evaluation results