genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-05_beta0.1_epoch8.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2562
  • Rewards/chosen: -1.3534
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.1750
  • Rewards/margins: -1.3534
  • Logps/rejected: -70.0574
  • Logps/chosen: -43.6040
  • Logits/rejected: -3.6546
  • Logits/chosen: -3.5215

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 8.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5466 0.1117 20 0.5253 0.3756 0.0 1.0 0.3756 -37.5665 -26.3140 -2.5322 -2.6369
0.4059 0.2235 40 0.4084 0.9166 0.0 0.9000 0.9166 -33.2775 -20.9040 -3.3833 -3.3405
0.4138 0.3352 60 0.3843 0.9750 0.0 1.0 0.9750 -32.0425 -20.3199 -3.3283 -3.3019
0.3939 0.4469 80 0.4022 0.8901 0.0 0.9500 0.8901 -32.4997 -21.1695 -3.3332 -3.3071
0.4055 0.5587 100 0.4635 0.7532 0.0 0.875 0.7532 -34.6906 -22.5383 -3.3361 -3.3059
0.4772 0.6704 120 0.5260 0.6294 0.0 0.875 0.6294 -36.9209 -23.7756 -3.2411 -3.2326
0.5798 0.7821 140 0.6274 0.3853 0.0 0.7750 0.3853 -38.5313 -26.2172 -3.0298 -3.0459
0.8527 0.8939 160 0.7349 0.2142 0.0 0.6500 0.2142 -40.1633 -27.9280 -2.9920 -3.0142
0.3929 1.0056 180 0.7815 0.1006 0.0 0.6000 0.1006 -41.8164 -29.0638 -3.0261 -3.0338
0.3188 1.1173 200 0.9600 -0.1511 0.0 0.375 -0.1511 -43.9392 -31.5815 -3.1820 -3.1691
0.5336 1.2291 220 0.9597 -0.1903 0.0 0.4750 -0.1903 -45.7020 -31.9727 -3.0984 -3.0886
0.4445 1.3408 240 1.0146 -0.1911 0.0 0.4250 -0.1911 -46.0488 -31.9815 -3.2781 -3.2460
0.6491 1.4525 260 0.9691 -0.0833 0.0 0.5 -0.0833 -44.8210 -30.9032 -3.1003 -3.0880
0.513 1.5642 280 0.9466 -0.0891 0.0 0.4250 -0.0891 -45.4357 -30.9608 -3.0974 -3.0767
0.4424 1.6760 300 0.9961 -0.2216 0.0 0.3250 -0.2216 -45.7439 -32.2863 -3.1617 -3.1317
0.5232 1.7877 320 0.9826 -0.1976 0.0 0.3500 -0.1976 -45.1696 -32.0464 -3.0142 -3.0153
0.5069 1.8994 340 0.9945 -0.0783 0.0 0.4500 -0.0783 -46.3133 -30.8527 -3.1964 -3.1637
0.2684 2.0112 360 1.0036 -0.0981 0.0 0.5 -0.0981 -45.3146 -31.0515 -3.0113 -3.0037
0.3519 2.1229 380 1.1985 -0.3804 0.0 0.3500 -0.3804 -48.5596 -33.8738 -3.1878 -3.1420
0.3155 2.2346 400 1.1909 -0.3289 0.0 0.3250 -0.3289 -48.8887 -33.3588 -3.1928 -3.1583
0.3517 2.3464 420 1.2317 -0.4161 0.0 0.3250 -0.4161 -51.7570 -34.2306 -3.1170 -3.0905
0.2643 2.4581 440 1.2229 -0.4421 0.0 0.3000 -0.4421 -49.8322 -34.4914 -3.0995 -3.0788
0.2984 2.5698 460 1.1842 -0.4098 0.0 0.3250 -0.4098 -47.7830 -34.1680 -2.9851 -2.9781
0.2776 2.6816 480 1.2433 -0.4190 0.0 0.3250 -0.4190 -48.9492 -34.2604 -3.0807 -3.0566
0.227 2.7933 500 1.2486 -0.3790 0.0 0.3250 -0.3790 -49.0447 -33.8596 -3.0917 -3.0775
0.3012 2.9050 520 1.1795 -0.2955 0.0 0.3500 -0.2955 -49.5190 -33.0246 -3.0727 -3.0613
0.2164 3.0168 540 1.2305 -0.4498 0.0 0.3500 -0.4498 -48.9411 -34.5676 -3.1648 -3.1365
0.2851 3.1285 560 1.4187 -0.5654 0.0 0.2250 -0.5654 -52.5012 -35.7240 -3.3471 -3.2835
0.2124 3.2402 580 1.3477 -0.5484 0.0 0.2250 -0.5484 -52.8059 -35.5544 -3.2139 -3.1734
0.2053 3.3520 600 1.4134 -0.6614 0.0 0.2250 -0.6614 -53.6831 -36.6840 -3.2238 -3.1798
0.3601 3.4637 620 1.4628 -0.7559 0.0 0.2250 -0.7559 -54.5884 -37.6292 -3.2911 -3.2421
0.1913 3.5754 640 1.4378 -0.6411 0.0 0.2250 -0.6411 -53.7149 -36.4812 -3.2762 -3.2227
0.2378 3.6872 660 1.4228 -0.7470 0.0 0.2000 -0.7470 -54.1003 -37.5405 -3.2093 -3.1707
0.2627 3.7989 680 1.4170 -0.5635 0.0 0.2250 -0.5635 -53.7126 -35.7054 -3.2556 -3.2116
0.1742 3.9106 700 1.4822 -0.6458 0.0 0.1500 -0.6458 -54.2694 -36.5282 -3.2417 -3.1927
0.1349 4.0223 720 1.4830 -0.6865 0.0 0.2000 -0.6865 -55.6546 -36.9353 -3.3105 -3.2496
0.1227 4.1341 740 1.6660 -0.9048 0.0 0.1500 -0.9048 -58.4858 -39.1180 -3.4280 -3.3460
0.1293 4.2458 760 1.5319 -0.7418 0.0 0.2000 -0.7418 -55.8039 -37.4880 -3.3607 -3.2969
0.1172 4.3575 780 1.5251 -0.7269 0.0 0.2250 -0.7269 -55.8217 -37.3393 -3.3338 -3.2761
0.1683 4.4693 800 1.5403 -0.7640 0.0 0.1500 -0.7640 -55.9445 -37.7102 -3.3597 -3.2995
0.1904 4.5810 820 1.5914 -0.8601 0.0 0.1750 -0.8601 -56.9242 -38.6707 -3.3721 -3.3039
0.2634 4.6927 840 1.4943 -0.7421 0.0 0.2000 -0.7421 -55.4652 -37.4915 -3.3267 -3.2725
0.1922 4.8045 860 1.4566 -0.7606 0.0 0.125 -0.7606 -55.5023 -37.6765 -3.3876 -3.3162
0.1825 4.9162 880 1.4759 -0.7833 0.0 0.125 -0.7833 -54.1260 -37.9035 -3.3873 -3.3143
0.1293 5.0279 900 1.4924 -0.7656 0.0 0.1500 -0.7656 -55.2890 -37.7256 -3.4025 -3.3230
0.0969 5.1397 920 1.8492 -1.0069 0.0 0.125 -1.0069 -61.5237 -40.1392 -3.4959 -3.4000
0.1006 5.2514 940 1.6623 -0.8035 0.0 0.1500 -0.8035 -58.7658 -38.1047 -3.4757 -3.3889
0.1129 5.3631 960 1.6896 -0.8444 0.0 0.1750 -0.8444 -59.1995 -38.5144 -3.5036 -3.4093
0.1248 5.4749 980 1.9180 -1.1702 0.0 0.1000 -1.1702 -62.2443 -41.7719 -3.5884 -3.4733
0.1063 5.5866 1000 1.7638 -0.9265 0.0 0.1500 -0.9265 -59.8830 -39.3352 -3.5476 -3.4446
0.1929 5.6983 1020 1.7843 -0.9282 0.0 0.125 -0.9282 -61.1666 -39.3518 -3.5739 -3.4656
0.1503 5.8101 1040 1.6424 -0.8562 0.0 0.125 -0.8562 -59.2886 -38.6320 -3.5519 -3.4453
0.137 5.9218 1060 1.6859 -0.7688 0.0 0.1500 -0.7688 -60.2979 -37.7578 -3.5396 -3.4304
0.0841 6.0335 1080 1.7235 -0.8587 0.0 0.1750 -0.8587 -61.1482 -38.6569 -3.5664 -3.4532
0.0798 6.1453 1100 2.1241 -1.2464 0.0 0.1500 -1.2464 -67.2607 -42.5339 -3.6434 -3.5114
0.0996 6.2570 1120 2.1727 -1.3262 0.0 0.1000 -1.3262 -68.6685 -43.3320 -3.6476 -3.5174
0.114 6.3687 1140 2.1072 -1.2928 0.0 0.1000 -1.2928 -67.2377 -42.9979 -3.6253 -3.4997
0.0937 6.4804 1160 2.0897 -1.3032 0.0 0.1000 -1.3032 -67.8363 -43.1023 -3.6172 -3.4898
0.0977 6.5922 1180 2.1033 -1.2756 0.0 0.125 -1.2756 -67.8044 -42.8260 -3.6423 -3.5153
0.0816 6.7039 1200 2.0782 -1.2335 0.0 0.125 -1.2335 -66.4155 -42.4054 -3.6321 -3.5036
0.0821 6.8156 1220 2.0229 -1.1573 0.0 0.1750 -1.1573 -66.1469 -41.6429 -3.6116 -3.4854
0.14 6.9274 1240 2.0659 -1.1567 0.0 0.1500 -1.1567 -66.6871 -41.6369 -3.6213 -3.4943
0.0874 7.0391 1260 2.0813 -1.1806 0.0 0.1750 -1.1806 -67.1844 -41.8758 -3.6316 -3.5036
0.1099 7.1508 1280 2.1351 -1.2467 0.0 0.1500 -1.2467 -67.7858 -42.5375 -3.6372 -3.5070
0.0759 7.2626 1300 2.1856 -1.3011 0.0 0.1500 -1.3011 -68.6701 -43.0811 -3.6452 -3.5134
0.0823 7.3743 1320 2.2192 -1.3394 0.0 0.1500 -1.3394 -69.4521 -43.4637 -3.6450 -3.5105
0.1278 7.4860 1340 2.2384 -1.3569 0.0 0.1500 -1.3569 -69.7412 -43.6389 -3.6481 -3.5135
0.1139 7.5978 1360 2.2508 -1.3602 0.0 0.1500 -1.3602 -69.8968 -43.6718 -3.6591 -3.5274
0.0792 7.7095 1380 2.2559 -1.3643 0.0 0.1500 -1.3643 -69.9685 -43.7135 -3.6531 -3.5199
0.1072 7.8212 1400 2.2601 -1.3847 0.0 0.1500 -1.3847 -69.9992 -43.9171 -3.6523 -3.5175
0.1035 7.9330 1420 2.2658 -1.3596 0.0 0.1750 -1.3596 -69.9729 -43.6664 -3.6490 -3.5141

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
3
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-05_beta0.1_epoch8.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1243)
this model