genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-05_beta0.1_epoch8.0_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:
- Loss: 2.2562
- Rewards/chosen: -1.3534
- Rewards/rejected: 0.0
- Rewards/accuracies: 0.1750
- Rewards/margins: -1.3534
- Logps/rejected: -70.0574
- Logps/chosen: -43.6040
- Logits/rejected: -3.6546
- Logits/chosen: -3.5215
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 8.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5466 | 0.1117 | 20 | 0.5253 | 0.3756 | 0.0 | 1.0 | 0.3756 | -37.5665 | -26.3140 | -2.5322 | -2.6369 |
| 0.4059 | 0.2235 | 40 | 0.4084 | 0.9166 | 0.0 | 0.9000 | 0.9166 | -33.2775 | -20.9040 | -3.3833 | -3.3405 |
| 0.4138 | 0.3352 | 60 | 0.3843 | 0.9750 | 0.0 | 1.0 | 0.9750 | -32.0425 | -20.3199 | -3.3283 | -3.3019 |
| 0.3939 | 0.4469 | 80 | 0.4022 | 0.8901 | 0.0 | 0.9500 | 0.8901 | -32.4997 | -21.1695 | -3.3332 | -3.3071 |
| 0.4055 | 0.5587 | 100 | 0.4635 | 0.7532 | 0.0 | 0.875 | 0.7532 | -34.6906 | -22.5383 | -3.3361 | -3.3059 |
| 0.4772 | 0.6704 | 120 | 0.5260 | 0.6294 | 0.0 | 0.875 | 0.6294 | -36.9209 | -23.7756 | -3.2411 | -3.2326 |
| 0.5798 | 0.7821 | 140 | 0.6274 | 0.3853 | 0.0 | 0.7750 | 0.3853 | -38.5313 | -26.2172 | -3.0298 | -3.0459 |
| 0.8527 | 0.8939 | 160 | 0.7349 | 0.2142 | 0.0 | 0.6500 | 0.2142 | -40.1633 | -27.9280 | -2.9920 | -3.0142 |
| 0.3929 | 1.0056 | 180 | 0.7815 | 0.1006 | 0.0 | 0.6000 | 0.1006 | -41.8164 | -29.0638 | -3.0261 | -3.0338 |
| 0.3188 | 1.1173 | 200 | 0.9600 | -0.1511 | 0.0 | 0.375 | -0.1511 | -43.9392 | -31.5815 | -3.1820 | -3.1691 |
| 0.5336 | 1.2291 | 220 | 0.9597 | -0.1903 | 0.0 | 0.4750 | -0.1903 | -45.7020 | -31.9727 | -3.0984 | -3.0886 |
| 0.4445 | 1.3408 | 240 | 1.0146 | -0.1911 | 0.0 | 0.4250 | -0.1911 | -46.0488 | -31.9815 | -3.2781 | -3.2460 |
| 0.6491 | 1.4525 | 260 | 0.9691 | -0.0833 | 0.0 | 0.5 | -0.0833 | -44.8210 | -30.9032 | -3.1003 | -3.0880 |
| 0.513 | 1.5642 | 280 | 0.9466 | -0.0891 | 0.0 | 0.4250 | -0.0891 | -45.4357 | -30.9608 | -3.0974 | -3.0767 |
| 0.4424 | 1.6760 | 300 | 0.9961 | -0.2216 | 0.0 | 0.3250 | -0.2216 | -45.7439 | -32.2863 | -3.1617 | -3.1317 |
| 0.5232 | 1.7877 | 320 | 0.9826 | -0.1976 | 0.0 | 0.3500 | -0.1976 | -45.1696 | -32.0464 | -3.0142 | -3.0153 |
| 0.5069 | 1.8994 | 340 | 0.9945 | -0.0783 | 0.0 | 0.4500 | -0.0783 | -46.3133 | -30.8527 | -3.1964 | -3.1637 |
| 0.2684 | 2.0112 | 360 | 1.0036 | -0.0981 | 0.0 | 0.5 | -0.0981 | -45.3146 | -31.0515 | -3.0113 | -3.0037 |
| 0.3519 | 2.1229 | 380 | 1.1985 | -0.3804 | 0.0 | 0.3500 | -0.3804 | -48.5596 | -33.8738 | -3.1878 | -3.1420 |
| 0.3155 | 2.2346 | 400 | 1.1909 | -0.3289 | 0.0 | 0.3250 | -0.3289 | -48.8887 | -33.3588 | -3.1928 | -3.1583 |
| 0.3517 | 2.3464 | 420 | 1.2317 | -0.4161 | 0.0 | 0.3250 | -0.4161 | -51.7570 | -34.2306 | -3.1170 | -3.0905 |
| 0.2643 | 2.4581 | 440 | 1.2229 | -0.4421 | 0.0 | 0.3000 | -0.4421 | -49.8322 | -34.4914 | -3.0995 | -3.0788 |
| 0.2984 | 2.5698 | 460 | 1.1842 | -0.4098 | 0.0 | 0.3250 | -0.4098 | -47.7830 | -34.1680 | -2.9851 | -2.9781 |
| 0.2776 | 2.6816 | 480 | 1.2433 | -0.4190 | 0.0 | 0.3250 | -0.4190 | -48.9492 | -34.2604 | -3.0807 | -3.0566 |
| 0.227 | 2.7933 | 500 | 1.2486 | -0.3790 | 0.0 | 0.3250 | -0.3790 | -49.0447 | -33.8596 | -3.0917 | -3.0775 |
| 0.3012 | 2.9050 | 520 | 1.1795 | -0.2955 | 0.0 | 0.3500 | -0.2955 | -49.5190 | -33.0246 | -3.0727 | -3.0613 |
| 0.2164 | 3.0168 | 540 | 1.2305 | -0.4498 | 0.0 | 0.3500 | -0.4498 | -48.9411 | -34.5676 | -3.1648 | -3.1365 |
| 0.2851 | 3.1285 | 560 | 1.4187 | -0.5654 | 0.0 | 0.2250 | -0.5654 | -52.5012 | -35.7240 | -3.3471 | -3.2835 |
| 0.2124 | 3.2402 | 580 | 1.3477 | -0.5484 | 0.0 | 0.2250 | -0.5484 | -52.8059 | -35.5544 | -3.2139 | -3.1734 |
| 0.2053 | 3.3520 | 600 | 1.4134 | -0.6614 | 0.0 | 0.2250 | -0.6614 | -53.6831 | -36.6840 | -3.2238 | -3.1798 |
| 0.3601 | 3.4637 | 620 | 1.4628 | -0.7559 | 0.0 | 0.2250 | -0.7559 | -54.5884 | -37.6292 | -3.2911 | -3.2421 |
| 0.1913 | 3.5754 | 640 | 1.4378 | -0.6411 | 0.0 | 0.2250 | -0.6411 | -53.7149 | -36.4812 | -3.2762 | -3.2227 |
| 0.2378 | 3.6872 | 660 | 1.4228 | -0.7470 | 0.0 | 0.2000 | -0.7470 | -54.1003 | -37.5405 | -3.2093 | -3.1707 |
| 0.2627 | 3.7989 | 680 | 1.4170 | -0.5635 | 0.0 | 0.2250 | -0.5635 | -53.7126 | -35.7054 | -3.2556 | -3.2116 |
| 0.1742 | 3.9106 | 700 | 1.4822 | -0.6458 | 0.0 | 0.1500 | -0.6458 | -54.2694 | -36.5282 | -3.2417 | -3.1927 |
| 0.1349 | 4.0223 | 720 | 1.4830 | -0.6865 | 0.0 | 0.2000 | -0.6865 | -55.6546 | -36.9353 | -3.3105 | -3.2496 |
| 0.1227 | 4.1341 | 740 | 1.6660 | -0.9048 | 0.0 | 0.1500 | -0.9048 | -58.4858 | -39.1180 | -3.4280 | -3.3460 |
| 0.1293 | 4.2458 | 760 | 1.5319 | -0.7418 | 0.0 | 0.2000 | -0.7418 | -55.8039 | -37.4880 | -3.3607 | -3.2969 |
| 0.1172 | 4.3575 | 780 | 1.5251 | -0.7269 | 0.0 | 0.2250 | -0.7269 | -55.8217 | -37.3393 | -3.3338 | -3.2761 |
| 0.1683 | 4.4693 | 800 | 1.5403 | -0.7640 | 0.0 | 0.1500 | -0.7640 | -55.9445 | -37.7102 | -3.3597 | -3.2995 |
| 0.1904 | 4.5810 | 820 | 1.5914 | -0.8601 | 0.0 | 0.1750 | -0.8601 | -56.9242 | -38.6707 | -3.3721 | -3.3039 |
| 0.2634 | 4.6927 | 840 | 1.4943 | -0.7421 | 0.0 | 0.2000 | -0.7421 | -55.4652 | -37.4915 | -3.3267 | -3.2725 |
| 0.1922 | 4.8045 | 860 | 1.4566 | -0.7606 | 0.0 | 0.125 | -0.7606 | -55.5023 | -37.6765 | -3.3876 | -3.3162 |
| 0.1825 | 4.9162 | 880 | 1.4759 | -0.7833 | 0.0 | 0.125 | -0.7833 | -54.1260 | -37.9035 | -3.3873 | -3.3143 |
| 0.1293 | 5.0279 | 900 | 1.4924 | -0.7656 | 0.0 | 0.1500 | -0.7656 | -55.2890 | -37.7256 | -3.4025 | -3.3230 |
| 0.0969 | 5.1397 | 920 | 1.8492 | -1.0069 | 0.0 | 0.125 | -1.0069 | -61.5237 | -40.1392 | -3.4959 | -3.4000 |
| 0.1006 | 5.2514 | 940 | 1.6623 | -0.8035 | 0.0 | 0.1500 | -0.8035 | -58.7658 | -38.1047 | -3.4757 | -3.3889 |
| 0.1129 | 5.3631 | 960 | 1.6896 | -0.8444 | 0.0 | 0.1750 | -0.8444 | -59.1995 | -38.5144 | -3.5036 | -3.4093 |
| 0.1248 | 5.4749 | 980 | 1.9180 | -1.1702 | 0.0 | 0.1000 | -1.1702 | -62.2443 | -41.7719 | -3.5884 | -3.4733 |
| 0.1063 | 5.5866 | 1000 | 1.7638 | -0.9265 | 0.0 | 0.1500 | -0.9265 | -59.8830 | -39.3352 | -3.5476 | -3.4446 |
| 0.1929 | 5.6983 | 1020 | 1.7843 | -0.9282 | 0.0 | 0.125 | -0.9282 | -61.1666 | -39.3518 | -3.5739 | -3.4656 |
| 0.1503 | 5.8101 | 1040 | 1.6424 | -0.8562 | 0.0 | 0.125 | -0.8562 | -59.2886 | -38.6320 | -3.5519 | -3.4453 |
| 0.137 | 5.9218 | 1060 | 1.6859 | -0.7688 | 0.0 | 0.1500 | -0.7688 | -60.2979 | -37.7578 | -3.5396 | -3.4304 |
| 0.0841 | 6.0335 | 1080 | 1.7235 | -0.8587 | 0.0 | 0.1750 | -0.8587 | -61.1482 | -38.6569 | -3.5664 | -3.4532 |
| 0.0798 | 6.1453 | 1100 | 2.1241 | -1.2464 | 0.0 | 0.1500 | -1.2464 | -67.2607 | -42.5339 | -3.6434 | -3.5114 |
| 0.0996 | 6.2570 | 1120 | 2.1727 | -1.3262 | 0.0 | 0.1000 | -1.3262 | -68.6685 | -43.3320 | -3.6476 | -3.5174 |
| 0.114 | 6.3687 | 1140 | 2.1072 | -1.2928 | 0.0 | 0.1000 | -1.2928 | -67.2377 | -42.9979 | -3.6253 | -3.4997 |
| 0.0937 | 6.4804 | 1160 | 2.0897 | -1.3032 | 0.0 | 0.1000 | -1.3032 | -67.8363 | -43.1023 | -3.6172 | -3.4898 |
| 0.0977 | 6.5922 | 1180 | 2.1033 | -1.2756 | 0.0 | 0.125 | -1.2756 | -67.8044 | -42.8260 | -3.6423 | -3.5153 |
| 0.0816 | 6.7039 | 1200 | 2.0782 | -1.2335 | 0.0 | 0.125 | -1.2335 | -66.4155 | -42.4054 | -3.6321 | -3.5036 |
| 0.0821 | 6.8156 | 1220 | 2.0229 | -1.1573 | 0.0 | 0.1750 | -1.1573 | -66.1469 | -41.6429 | -3.6116 | -3.4854 |
| 0.14 | 6.9274 | 1240 | 2.0659 | -1.1567 | 0.0 | 0.1500 | -1.1567 | -66.6871 | -41.6369 | -3.6213 | -3.4943 |
| 0.0874 | 7.0391 | 1260 | 2.0813 | -1.1806 | 0.0 | 0.1750 | -1.1806 | -67.1844 | -41.8758 | -3.6316 | -3.5036 |
| 0.1099 | 7.1508 | 1280 | 2.1351 | -1.2467 | 0.0 | 0.1500 | -1.2467 | -67.7858 | -42.5375 | -3.6372 | -3.5070 |
| 0.0759 | 7.2626 | 1300 | 2.1856 | -1.3011 | 0.0 | 0.1500 | -1.3011 | -68.6701 | -43.0811 | -3.6452 | -3.5134 |
| 0.0823 | 7.3743 | 1320 | 2.2192 | -1.3394 | 0.0 | 0.1500 | -1.3394 | -69.4521 | -43.4637 | -3.6450 | -3.5105 |
| 0.1278 | 7.4860 | 1340 | 2.2384 | -1.3569 | 0.0 | 0.1500 | -1.3569 | -69.7412 | -43.6389 | -3.6481 | -3.5135 |
| 0.1139 | 7.5978 | 1360 | 2.2508 | -1.3602 | 0.0 | 0.1500 | -1.3602 | -69.8968 | -43.6718 | -3.6591 | -3.5274 |
| 0.0792 | 7.7095 | 1380 | 2.2559 | -1.3643 | 0.0 | 0.1500 | -1.3643 | -69.9685 | -43.7135 | -3.6531 | -3.5199 |
| 0.1072 | 7.8212 | 1400 | 2.2601 | -1.3847 | 0.0 | 0.1500 | -1.3847 | -69.9992 | -43.9171 | -3.6523 | -3.5175 |
| 0.1035 | 7.9330 | 1420 | 2.2658 | -1.3596 | 0.0 | 0.1750 | -1.3596 | -69.9729 | -43.6664 | -3.6490 | -3.5141 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 3