You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

moe_hom_100m

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7001

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 66788
  • training_steps: 667880
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.474 0.1497 10000 8.4434
7.3705 0.2995 20000 7.3390
6.4743 0.4492 30000 6.4405
5.8985 0.5989 40000 5.8659
5.5978 0.7486 50000 5.5786
5.4382 0.8984 60000 5.4064
5.2974 1.0481 70000 5.2621
5.165 1.1978 80000 5.1229
5.0265 1.3475 90000 5.0155
4.9568 1.4973 100000 4.9374
4.872 1.6470 110000 4.8702
4.814 1.7967 120000 4.8172
4.759 1.9465 130000 4.7736
4.5891 2.0962 140000 4.7425
4.5934 2.2459 150000 4.7119
4.5623 2.3956 160000 4.6824
4.5499 2.5454 170000 4.6549
4.5505 2.6951 180000 4.6280
4.5343 2.8448 190000 4.6039
4.5138 2.9945 200000 4.5791
4.3003 3.1443 210000 4.5858
4.324 3.2940 220000 4.5730
4.3332 3.4437 230000 4.5554
4.3436 3.5934 240000 4.5378
4.3339 3.7432 250000 4.5219
4.3071 3.8929 260000 4.5056
4.0475 4.0426 270000 4.5268
4.0936 4.1923 280000 4.5278
4.1204 4.3421 290000 4.5152
4.1392 4.4918 300000 4.5024
4.1562 4.6415 310000 4.4896
4.1714 4.7912 320000 4.4752
4.1695 4.9410 330000 4.4641
3.8707 5.0907 340000 4.5161
3.917 5.2404 350000 4.5157
3.9432 5.3901 360000 4.5082
3.9817 5.5399 370000 4.4955
3.9956 5.6896 380000 4.4823
3.9952 5.8393 390000 4.4715
4.015 5.9891 400000 4.4593
3.7174 6.1388 410000 4.5393
3.7456 6.2885 420000 4.5367
3.7796 6.4382 430000 4.5265
3.8109 6.5879 440000 4.5160
3.8051 6.7377 450000 4.5069
3.8109 6.8874 460000 4.4948
3.5003 7.0371 470000 4.5652
3.5381 7.1868 480000 4.5890
3.5896 7.3366 490000 4.5854
3.6105 7.4863 500000 4.5757
3.6093 7.6360 510000 4.5676
3.6397 7.7858 520000 4.5572
3.6258 7.9355 530000 4.5470
3.353 8.0852 540000 4.6341
3.3623 8.2349 550000 4.6469
3.4051 8.3846 560000 4.6475
3.4427 8.5344 570000 4.6403
3.4094 8.6841 580000 4.6347
3.4441 8.8338 590000 4.6281
3.437 8.9836 600000 4.6232
3.1943 9.1333 610000 4.6993
3.2034 9.2830 620000 4.7052
3.2252 9.4327 630000 4.7053
3.2246 9.5825 640000 4.7055
3.2353 9.7322 650000 4.7018
3.2326 9.8819 660000 4.7006

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support