genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-07_beta0.1_epoch8.0_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:
- Loss: 0.3658
- Rewards/chosen: 1.1075
- Rewards/rejected: 0.0
- Rewards/accuracies: 0.9000
- Rewards/margins: 1.1075
- Logps/rejected: -33.0278
- Logps/chosen: -18.9948
- Logits/rejected: -3.5375
- Logits/chosen: -3.4431
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 8.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6933 | 0.1117 | 20 | 0.6927 | -0.0050 | 0.0 | 0.4250 | -0.0050 | -41.6216 | -30.1197 | -2.2278 | -2.3742 |
| 0.6769 | 0.2235 | 40 | 0.6772 | 0.0340 | 0.0 | 0.8000 | 0.0340 | -41.3389 | -29.7304 | -2.2451 | -2.3885 |
| 0.6325 | 0.3352 | 60 | 0.6327 | 0.1201 | 0.0 | 0.9750 | 0.1201 | -40.3433 | -28.8695 | -2.3126 | -2.4487 |
| 0.552 | 0.4469 | 80 | 0.5472 | 0.3232 | 0.0 | 1.0 | 0.3232 | -38.1920 | -26.8384 | -2.4785 | -2.5917 |
| 0.4166 | 0.5587 | 100 | 0.4622 | 0.5449 | 0.0 | 1.0 | 0.5449 | -35.6187 | -24.6215 | -2.7023 | -2.7853 |
| 0.4124 | 0.6704 | 120 | 0.4133 | 0.7154 | 0.0 | 0.9750 | 0.7154 | -34.0244 | -22.9157 | -2.9427 | -2.9959 |
| 0.4072 | 0.7821 | 140 | 0.3868 | 0.8328 | 0.0 | 0.9750 | 0.8328 | -33.0483 | -21.7421 | -3.0564 | -3.0879 |
| 0.3682 | 0.8939 | 160 | 0.3694 | 0.9289 | 0.0 | 0.9750 | 0.9289 | -32.4181 | -20.7810 | -3.1688 | -3.1797 |
| 0.3347 | 1.0056 | 180 | 0.3606 | 0.9708 | 0.0 | 0.9750 | 0.9708 | -32.0167 | -20.3620 | -3.2109 | -3.2133 |
| 0.3428 | 1.1173 | 200 | 0.3570 | 0.9820 | 0.0 | 0.9750 | 0.9820 | -31.8110 | -20.2500 | -3.2368 | -3.2277 |
| 0.3198 | 1.2291 | 220 | 0.3539 | 0.9979 | 0.0 | 0.9750 | 0.9979 | -31.5782 | -20.0911 | -3.2694 | -3.2536 |
| 0.3279 | 1.3408 | 240 | 0.3513 | 1.0271 | 0.0 | 0.9750 | 1.0271 | -31.4944 | -19.7987 | -3.2925 | -3.2709 |
| 0.3299 | 1.4525 | 260 | 0.3501 | 1.0274 | 0.0 | 0.9750 | 1.0274 | -31.5820 | -19.7956 | -3.3043 | -3.2821 |
| 0.3102 | 1.5642 | 280 | 0.3456 | 1.0467 | 0.0 | 0.9750 | 1.0467 | -31.2429 | -19.6030 | -3.3094 | -3.2909 |
| 0.3649 | 1.6760 | 300 | 0.3460 | 1.0501 | 0.0 | 0.9750 | 1.0501 | -31.2992 | -19.5688 | -3.3220 | -3.2938 |
| 0.3392 | 1.7877 | 320 | 0.3441 | 1.0555 | 0.0 | 0.9750 | 1.0555 | -31.2234 | -19.5146 | -3.3241 | -3.2951 |
| 0.3424 | 1.8994 | 340 | 0.3420 | 1.0586 | 0.0 | 0.9750 | 1.0586 | -31.1263 | -19.4839 | -3.3264 | -3.2999 |
| 0.2988 | 2.0112 | 360 | 0.3409 | 1.0713 | 0.0 | 0.9750 | 1.0713 | -30.9231 | -19.3570 | -3.3451 | -3.3142 |
| 0.2874 | 2.1229 | 380 | 0.3425 | 1.0877 | 0.0 | 0.9750 | 1.0877 | -30.9979 | -19.1933 | -3.3711 | -3.3294 |
| 0.2657 | 2.2346 | 400 | 0.3420 | 1.0972 | 0.0 | 0.9750 | 1.0972 | -31.0448 | -19.0985 | -3.3937 | -3.3495 |
| 0.306 | 2.3464 | 420 | 0.3429 | 1.1068 | 0.0 | 0.9750 | 1.1068 | -31.1971 | -19.0020 | -3.4022 | -3.3514 |
| 0.2743 | 2.4581 | 440 | 0.3409 | 1.1059 | 0.0 | 0.9750 | 1.1059 | -31.2159 | -19.0107 | -3.3961 | -3.3463 |
| 0.2916 | 2.5698 | 460 | 0.3394 | 1.1093 | 0.0 | 0.9750 | 1.1093 | -31.0894 | -18.9768 | -3.4099 | -3.3627 |
| 0.2729 | 2.6816 | 480 | 0.3397 | 1.1164 | 0.0 | 0.9750 | 1.1164 | -31.1564 | -18.9062 | -3.4079 | -3.3561 |
| 0.2424 | 2.7933 | 500 | 0.3375 | 1.1261 | 0.0 | 0.9750 | 1.1261 | -31.1838 | -18.8088 | -3.4113 | -3.3589 |
| 0.2642 | 2.9050 | 520 | 0.3409 | 1.1068 | 0.0 | 0.9750 | 1.1068 | -31.2621 | -19.0017 | -3.4265 | -3.3712 |
| 0.2717 | 3.0168 | 540 | 0.3375 | 1.1340 | 0.0 | 0.9750 | 1.1340 | -31.1489 | -18.7302 | -3.4243 | -3.3692 |
| 0.2608 | 3.1285 | 560 | 0.3406 | 1.1287 | 0.0 | 0.9750 | 1.1287 | -31.4594 | -18.7828 | -3.4436 | -3.3809 |
| 0.2332 | 3.2402 | 580 | 0.3423 | 1.1310 | 0.0 | 0.9500 | 1.1310 | -31.4343 | -18.7603 | -3.4628 | -3.3948 |
| 0.2332 | 3.3520 | 600 | 0.3404 | 1.1220 | 0.0 | 0.9500 | 1.1220 | -31.3091 | -18.8498 | -3.4676 | -3.4024 |
| 0.2239 | 3.4637 | 620 | 0.3416 | 1.1348 | 0.0 | 0.9500 | 1.1348 | -31.3410 | -18.7218 | -3.4763 | -3.4094 |
| 0.2347 | 3.5754 | 640 | 0.3426 | 1.1295 | 0.0 | 0.9500 | 1.1295 | -31.4972 | -18.7749 | -3.4646 | -3.3927 |
| 0.2427 | 3.6872 | 660 | 0.3427 | 1.1322 | 0.0 | 0.9750 | 1.1322 | -31.6783 | -18.7479 | -3.4703 | -3.3980 |
| 0.2738 | 3.7989 | 680 | 0.3426 | 1.1336 | 0.0 | 0.9750 | 1.1336 | -31.5848 | -18.7342 | -3.4777 | -3.4049 |
| 0.223 | 3.9106 | 700 | 0.3426 | 1.1363 | 0.0 | 0.9750 | 1.1363 | -31.8657 | -18.7067 | -3.4798 | -3.4061 |
| 0.2073 | 4.0223 | 720 | 0.3417 | 1.1402 | 0.0 | 0.9750 | 1.1402 | -31.7798 | -18.6685 | -3.4826 | -3.4088 |
| 0.2014 | 4.1341 | 740 | 0.3472 | 1.1410 | 0.0 | 0.9500 | 1.1410 | -31.9067 | -18.6602 | -3.4940 | -3.4166 |
| 0.1948 | 4.2458 | 760 | 0.3518 | 1.1247 | 0.0 | 0.9500 | 1.1247 | -32.0948 | -18.8234 | -3.5043 | -3.4247 |
| 0.1866 | 4.3575 | 780 | 0.3509 | 1.1220 | 0.0 | 0.9250 | 1.1220 | -32.2522 | -18.8503 | -3.5080 | -3.4275 |
| 0.2203 | 4.4693 | 800 | 0.3512 | 1.1248 | 0.0 | 0.9250 | 1.1248 | -32.2119 | -18.8216 | -3.5013 | -3.4178 |
| 0.2205 | 4.5810 | 820 | 0.3510 | 1.1195 | 0.0 | 0.9250 | 1.1195 | -32.1996 | -18.8746 | -3.5068 | -3.4248 |
| 0.2595 | 4.6927 | 840 | 0.3507 | 1.1256 | 0.0 | 0.9250 | 1.1256 | -32.0943 | -18.8140 | -3.5082 | -3.4267 |
| 0.2227 | 4.8045 | 860 | 0.3507 | 1.1249 | 0.0 | 0.9250 | 1.1249 | -32.2668 | -18.8215 | -3.5116 | -3.4283 |
| 0.2003 | 4.9162 | 880 | 0.3515 | 1.1307 | 0.0 | 0.9250 | 1.1307 | -32.0981 | -18.7635 | -3.5139 | -3.4305 |
| 0.1739 | 5.0279 | 900 | 0.3514 | 1.1300 | 0.0 | 0.9250 | 1.1300 | -32.2913 | -18.7695 | -3.5157 | -3.4306 |
| 0.1738 | 5.1397 | 920 | 0.3549 | 1.1322 | 0.0 | 0.9250 | 1.1322 | -32.4277 | -18.7481 | -3.5290 | -3.4432 |
| 0.1929 | 5.2514 | 940 | 0.3553 | 1.1277 | 0.0 | 0.9250 | 1.1277 | -32.4081 | -18.7933 | -3.5179 | -3.4289 |
| 0.189 | 5.3631 | 960 | 0.3582 | 1.1324 | 0.0 | 0.9250 | 1.1324 | -32.4295 | -18.7463 | -3.5237 | -3.4345 |
| 0.1813 | 5.4749 | 980 | 0.3602 | 1.1267 | 0.0 | 0.9250 | 1.1267 | -32.4544 | -18.8029 | -3.5225 | -3.4308 |
| 0.2029 | 5.5866 | 1000 | 0.3599 | 1.1212 | 0.0 | 0.9250 | 1.1212 | -32.6243 | -18.8581 | -3.5247 | -3.4340 |
| 0.1749 | 5.6983 | 1020 | 0.3600 | 1.1256 | 0.0 | 0.9250 | 1.1256 | -32.6542 | -18.8145 | -3.5226 | -3.4313 |
| 0.2015 | 5.8101 | 1040 | 0.3602 | 1.1170 | 0.0 | 0.9000 | 1.1170 | -32.6964 | -18.9004 | -3.5208 | -3.4276 |
| 0.2101 | 5.9218 | 1060 | 0.3598 | 1.1177 | 0.0 | 0.9250 | 1.1177 | -32.6506 | -18.8927 | -3.5358 | -3.4462 |
| 0.1563 | 6.0335 | 1080 | 0.3603 | 1.1150 | 0.0 | 0.9000 | 1.1150 | -32.7835 | -18.9203 | -3.5304 | -3.4379 |
| 0.1475 | 6.1453 | 1100 | 0.3630 | 1.1060 | 0.0 | 0.9000 | 1.1060 | -32.9071 | -19.0102 | -3.5235 | -3.4285 |
| 0.1755 | 6.2570 | 1120 | 0.3642 | 1.1158 | 0.0 | 0.9000 | 1.1158 | -32.8912 | -18.9122 | -3.5306 | -3.4369 |
| 0.1843 | 6.3687 | 1140 | 0.3648 | 1.1185 | 0.0 | 0.9000 | 1.1185 | -32.8595 | -18.8852 | -3.5372 | -3.4455 |
| 0.1755 | 6.4804 | 1160 | 0.3657 | 1.1000 | 0.0 | 0.9000 | 1.1000 | -33.1383 | -19.0699 | -3.5363 | -3.4444 |
| 0.1796 | 6.5922 | 1180 | 0.3662 | 1.1118 | 0.0 | 0.9250 | 1.1118 | -32.8604 | -18.9520 | -3.5346 | -3.4422 |
| 0.1584 | 6.7039 | 1200 | 0.3657 | 1.1053 | 0.0 | 0.9000 | 1.1053 | -33.0480 | -19.0167 | -3.5364 | -3.4435 |
| 0.1621 | 6.8156 | 1220 | 0.3639 | 1.1071 | 0.0 | 0.9000 | 1.1071 | -33.0234 | -18.9991 | -3.5353 | -3.4420 |
| 0.2112 | 6.9274 | 1240 | 0.3662 | 1.1166 | 0.0 | 0.9250 | 1.1166 | -33.1209 | -18.9039 | -3.5362 | -3.4425 |
| 0.1614 | 7.0391 | 1260 | 0.3650 | 1.1068 | 0.0 | 0.9000 | 1.1068 | -32.9925 | -19.0016 | -3.5379 | -3.4448 |
| 0.1855 | 7.1508 | 1280 | 0.3670 | 1.1084 | 0.0 | 0.9000 | 1.1084 | -33.1895 | -18.9857 | -3.5339 | -3.4401 |
| 0.1533 | 7.2626 | 1300 | 0.3650 | 1.1118 | 0.0 | 0.9000 | 1.1118 | -33.1868 | -18.9523 | -3.5395 | -3.4467 |
| 0.1549 | 7.3743 | 1320 | 0.3651 | 1.1105 | 0.0 | 0.9250 | 1.1105 | -33.0131 | -18.9651 | -3.5402 | -3.4477 |
| 0.2011 | 7.4860 | 1340 | 0.3665 | 1.1055 | 0.0 | 0.9000 | 1.1055 | -33.2247 | -19.0150 | -3.5371 | -3.4434 |
| 0.2011 | 7.5978 | 1360 | 0.3658 | 1.1100 | 0.0 | 0.9000 | 1.1100 | -33.1080 | -18.9700 | -3.5330 | -3.4375 |
| 0.162 | 7.7095 | 1380 | 0.3661 | 1.1120 | 0.0 | 0.9000 | 1.1120 | -33.2256 | -18.9503 | -3.5422 | -3.4496 |
| 0.1768 | 7.8212 | 1400 | 0.3668 | 1.1069 | 0.0 | 0.9250 | 1.1069 | -33.1460 | -19.0009 | -3.5356 | -3.4417 |
| 0.1927 | 7.9330 | 1420 | 0.3653 | 1.1076 | 0.0 | 0.9250 | 1.1076 | -33.0177 | -18.9939 | -3.5369 | -3.4431 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 3