train_stsb_101112_1760638037 / README.md

rbelanec

End of training

a1a0556 verified 2 months ago

preview code

raw

history blame contribute delete

3.04 kB

metadata

library_name: peft
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
  - base_model:adapter:meta-llama/Meta-Llama-3-8B-Instruct
  - llama-factory
  - transformers
pipeline_tag: text-generation
model-index:
  - name: train_stsb_101112_1760638037
    results: []

train_stsb_101112_1760638037

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4393
Num Input Tokens Seen: 8712528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6419	1.0	1294	0.4976	434624
0.402	2.0	2588	0.4681	869056
0.3758	3.0	3882	0.4575	1304160
0.4497	4.0	5176	0.4449	1740416
0.3365	5.0	6470	0.4393	2175568
0.4103	6.0	7764	0.4405	2611168
0.4	7.0	9058	0.4405	3047200
0.4053	8.0	10352	0.4508	3482720
0.3003	9.0	11646	0.4752	3918416
0.4131	10.0	12940	0.4874	4355072
0.4063	11.0	14234	0.4960	4790336
0.3632	12.0	15528	0.5173	5227040
0.2523	13.0	16822	0.5918	5662848
0.3145	14.0	18116	0.5935	6099600
0.2143	15.0	19410	0.6925	6534256
0.2152	16.0	20704	0.7844	6968992
0.1456	17.0	21998	0.9381	7405040
0.1341	18.0	23292	1.0220	7840784
0.0741	19.0	24586	1.0400	8276160
0.3049	20.0	25880	1.0398	8712528

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4