You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

moe_hom_100m

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

Loss: 4.7001

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 32
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 66788
training_steps: 667880
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
8.474	0.1497	10000	8.4434
7.3705	0.2995	20000	7.3390
6.4743	0.4492	30000	6.4405
5.8985	0.5989	40000	5.8659
5.5978	0.7486	50000	5.5786
5.4382	0.8984	60000	5.4064
5.2974	1.0481	70000	5.2621
5.165	1.1978	80000	5.1229
5.0265	1.3475	90000	5.0155
4.9568	1.4973	100000	4.9374
4.872	1.6470	110000	4.8702
4.814	1.7967	120000	4.8172
4.759	1.9465	130000	4.7736
4.5891	2.0962	140000	4.7425
4.5934	2.2459	150000	4.7119
4.5623	2.3956	160000	4.6824
4.5499	2.5454	170000	4.6549
4.5505	2.6951	180000	4.6280
4.5343	2.8448	190000	4.6039
4.5138	2.9945	200000	4.5791
4.3003	3.1443	210000	4.5858
4.324	3.2940	220000	4.5730
4.3332	3.4437	230000	4.5554
4.3436	3.5934	240000	4.5378
4.3339	3.7432	250000	4.5219
4.3071	3.8929	260000	4.5056
4.0475	4.0426	270000	4.5268
4.0936	4.1923	280000	4.5278
4.1204	4.3421	290000	4.5152
4.1392	4.4918	300000	4.5024
4.1562	4.6415	310000	4.4896
4.1714	4.7912	320000	4.4752
4.1695	4.9410	330000	4.4641
3.8707	5.0907	340000	4.5161
3.917	5.2404	350000	4.5157
3.9432	5.3901	360000	4.5082
3.9817	5.5399	370000	4.4955
3.9956	5.6896	380000	4.4823
3.9952	5.8393	390000	4.4715
4.015	5.9891	400000	4.4593
3.7174	6.1388	410000	4.5393
3.7456	6.2885	420000	4.5367
3.7796	6.4382	430000	4.5265
3.8109	6.5879	440000	4.5160
3.8051	6.7377	450000	4.5069
3.8109	6.8874	460000	4.4948
3.5003	7.0371	470000	4.5652
3.5381	7.1868	480000	4.5890
3.5896	7.3366	490000	4.5854
3.6105	7.4863	500000	4.5757
3.6093	7.6360	510000	4.5676
3.6397	7.7858	520000	4.5572
3.6258	7.9355	530000	4.5470
3.353	8.0852	540000	4.6341
3.3623	8.2349	550000	4.6469
3.4051	8.3846	560000	4.6475
3.4427	8.5344	570000	4.6403
3.4094	8.6841	580000	4.6347
3.4441	8.8338	590000	4.6281
3.437	8.9836	600000	4.6232
3.1943	9.1333	610000	4.6993
3.2034	9.2830	620000	4.7052
3.2252	9.4327	630000	4.7053
3.2246	9.5825	640000	4.7055
3.2353	9.7322	650000	4.7018
3.2326	9.8819	660000	4.7006

Framework versions

Transformers 4.51.0
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32