smallm_140_rope

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8849
  • Num Input Tokens Seen: 7864320000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 64
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_APEX_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 600
  • training_steps: 30000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
22.7795 0.0034 500 5.5598 131072000
17.4367 0.0067 1000 4.2485 262144000
15.7688 0.0101 1500 3.8666 393216000
14.9894 0.0135 2000 3.6752 524288000
14.5031 0.0168 2500 3.5482 655360000
14.2405 0.0202 3000 3.4691 786432000
13.9742 0.0235 3500 3.4102 917504000
13.7434 0.0269 4000 3.3646 1048576000
13.6039 0.0303 4500 3.3241 1179648000
13.4891 0.0336 5000 3.2917 1310720000
13.3718 0.0370 5500 3.2622 1441792000
13.3443 0.0404 6000 3.2371 1572864000
13.1945 0.0437 6500 3.2159 1703936000
13.0997 0.0471 7000 3.1944 1835008000
13.1213 0.0504 7500 3.1756 1966080000
12.977 0.0538 8000 3.1572 2097152000
12.8892 0.0572 8500 3.1434 2228224000
12.8258 0.0605 9000 3.1286 2359296000
12.8312 0.0639 9500 3.1131 2490368000
12.7303 0.0673 10000 3.1027 2621440000
12.6896 0.0706 10500 3.0884 2752512000
12.6273 0.0740 11000 3.0768 2883584000
12.6166 0.0774 11500 3.0653 3014656000
12.57 0.0807 12000 3.0549 3145728000
12.5742 0.0841 12500 3.0441 3276800000
12.4583 0.0874 13000 3.0347 3407872000
12.4483 0.0908 13500 3.0267 3538944000
12.3991 0.0942 14000 3.0163 3670016000
12.4215 0.0975 14500 3.0072 3801088000
12.367 0.1009 15000 2.9978 3932160000
12.3038 0.1043 15500 2.9913 4063232000
12.2584 0.1076 16000 2.9842 4194304000
12.2762 0.1110 16500 2.9755 4325376000
12.2252 0.1144 17000 2.9685 4456448000
12.2031 0.1177 17500 2.9616 4587520000
12.1477 0.1211 18000 2.9548 4718592000
12.1604 0.1244 18500 2.9489 4849664000
12.0931 0.1278 19000 2.9431 4980736000
12.0719 0.1312 19500 2.9375 5111808000
12.0672 0.1345 20000 2.9316 5242880000
12.0664 0.1379 20500 2.9268 5373952000
12.029 0.1413 21000 2.9219 5505024000
12.0514 0.1446 21500 2.9168 5636096000
12.0284 0.1480 22000 2.9136 5767168000
11.9863 0.1513 22500 2.9087 5898240000
11.9845 0.1547 23000 2.9054 6029312000
11.9285 0.1581 23500 2.9019 6160384000
11.9322 0.1614 24000 2.8988 6291456000
11.9627 0.1648 24500 2.8965 6422528000
11.9144 0.1682 25000 2.8939 6553600000
11.926 0.1715 25500 2.8921 6684672000
11.9298 0.1749 26000 2.8901 6815744000
11.9117 0.1783 26500 2.8888 6946816000
11.945 0.1816 27000 2.8873 7077888000
11.9267 0.1850 27500 2.8867 7208960000
11.9013 0.1883 28000 2.8857 7340032000
11.9102 0.1917 28500 2.8853 7471104000
11.9402 0.1951 29000 2.8851 7602176000
11.9125 0.1984 29500 2.8849 7733248000
11.9242 0.2018 30000 2.8849 7864320000

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support