--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 model-index: - name: genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-07_beta0.1_epoch8.0_42 results: [] --- # genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-07_beta0.1_epoch8.0_42 This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set: - Loss: 0.3658 - Rewards/chosen: 1.1075 - Rewards/rejected: 0.0 - Rewards/accuracies: 0.9000 - Rewards/margins: 1.1075 - Logps/rejected: -33.0278 - Logps/chosen: -18.9948 - Logits/rejected: -3.5375 - Logits/chosen: -3.4431 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 32 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 8.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6933 | 0.1117 | 20 | 0.6927 | -0.0050 | 0.0 | 0.4250 | -0.0050 | -41.6216 | -30.1197 | -2.2278 | -2.3742 | | 0.6769 | 0.2235 | 40 | 0.6772 | 0.0340 | 0.0 | 0.8000 | 0.0340 | -41.3389 | -29.7304 | -2.2451 | -2.3885 | | 0.6325 | 0.3352 | 60 | 0.6327 | 0.1201 | 0.0 | 0.9750 | 0.1201 | -40.3433 | -28.8695 | -2.3126 | -2.4487 | | 0.552 | 0.4469 | 80 | 0.5472 | 0.3232 | 0.0 | 1.0 | 0.3232 | -38.1920 | -26.8384 | -2.4785 | -2.5917 | | 0.4166 | 0.5587 | 100 | 0.4622 | 0.5449 | 0.0 | 1.0 | 0.5449 | -35.6187 | -24.6215 | -2.7023 | -2.7853 | | 0.4124 | 0.6704 | 120 | 0.4133 | 0.7154 | 0.0 | 0.9750 | 0.7154 | -34.0244 | -22.9157 | -2.9427 | -2.9959 | | 0.4072 | 0.7821 | 140 | 0.3868 | 0.8328 | 0.0 | 0.9750 | 0.8328 | -33.0483 | -21.7421 | -3.0564 | -3.0879 | | 0.3682 | 0.8939 | 160 | 0.3694 | 0.9289 | 0.0 | 0.9750 | 0.9289 | -32.4181 | -20.7810 | -3.1688 | -3.1797 | | 0.3347 | 1.0056 | 180 | 0.3606 | 0.9708 | 0.0 | 0.9750 | 0.9708 | -32.0167 | -20.3620 | -3.2109 | -3.2133 | | 0.3428 | 1.1173 | 200 | 0.3570 | 0.9820 | 0.0 | 0.9750 | 0.9820 | -31.8110 | -20.2500 | -3.2368 | -3.2277 | | 0.3198 | 1.2291 | 220 | 0.3539 | 0.9979 | 0.0 | 0.9750 | 0.9979 | -31.5782 | -20.0911 | -3.2694 | -3.2536 | | 0.3279 | 1.3408 | 240 | 0.3513 | 1.0271 | 0.0 | 0.9750 | 1.0271 | -31.4944 | -19.7987 | -3.2925 | -3.2709 | | 0.3299 | 1.4525 | 260 | 0.3501 | 1.0274 | 0.0 | 0.9750 | 1.0274 | -31.5820 | -19.7956 | -3.3043 | -3.2821 | | 0.3102 | 1.5642 | 280 | 0.3456 | 1.0467 | 0.0 | 0.9750 | 1.0467 | -31.2429 | -19.6030 | -3.3094 | -3.2909 | | 0.3649 | 1.6760 | 300 | 0.3460 | 1.0501 | 0.0 | 0.9750 | 1.0501 | -31.2992 | -19.5688 | -3.3220 | -3.2938 | | 0.3392 | 1.7877 | 320 | 0.3441 | 1.0555 | 0.0 | 0.9750 | 1.0555 | -31.2234 | -19.5146 | -3.3241 | -3.2951 | | 0.3424 | 1.8994 | 340 | 0.3420 | 1.0586 | 0.0 | 0.9750 | 1.0586 | -31.1263 | -19.4839 | -3.3264 | -3.2999 | | 0.2988 | 2.0112 | 360 | 0.3409 | 1.0713 | 0.0 | 0.9750 | 1.0713 | -30.9231 | -19.3570 | -3.3451 | -3.3142 | | 0.2874 | 2.1229 | 380 | 0.3425 | 1.0877 | 0.0 | 0.9750 | 1.0877 | -30.9979 | -19.1933 | -3.3711 | -3.3294 | | 0.2657 | 2.2346 | 400 | 0.3420 | 1.0972 | 0.0 | 0.9750 | 1.0972 | -31.0448 | -19.0985 | -3.3937 | -3.3495 | | 0.306 | 2.3464 | 420 | 0.3429 | 1.1068 | 0.0 | 0.9750 | 1.1068 | -31.1971 | -19.0020 | -3.4022 | -3.3514 | | 0.2743 | 2.4581 | 440 | 0.3409 | 1.1059 | 0.0 | 0.9750 | 1.1059 | -31.2159 | -19.0107 | -3.3961 | -3.3463 | | 0.2916 | 2.5698 | 460 | 0.3394 | 1.1093 | 0.0 | 0.9750 | 1.1093 | -31.0894 | -18.9768 | -3.4099 | -3.3627 | | 0.2729 | 2.6816 | 480 | 0.3397 | 1.1164 | 0.0 | 0.9750 | 1.1164 | -31.1564 | -18.9062 | -3.4079 | -3.3561 | | 0.2424 | 2.7933 | 500 | 0.3375 | 1.1261 | 0.0 | 0.9750 | 1.1261 | -31.1838 | -18.8088 | -3.4113 | -3.3589 | | 0.2642 | 2.9050 | 520 | 0.3409 | 1.1068 | 0.0 | 0.9750 | 1.1068 | -31.2621 | -19.0017 | -3.4265 | -3.3712 | | 0.2717 | 3.0168 | 540 | 0.3375 | 1.1340 | 0.0 | 0.9750 | 1.1340 | -31.1489 | -18.7302 | -3.4243 | -3.3692 | | 0.2608 | 3.1285 | 560 | 0.3406 | 1.1287 | 0.0 | 0.9750 | 1.1287 | -31.4594 | -18.7828 | -3.4436 | -3.3809 | | 0.2332 | 3.2402 | 580 | 0.3423 | 1.1310 | 0.0 | 0.9500 | 1.1310 | -31.4343 | -18.7603 | -3.4628 | -3.3948 | | 0.2332 | 3.3520 | 600 | 0.3404 | 1.1220 | 0.0 | 0.9500 | 1.1220 | -31.3091 | -18.8498 | -3.4676 | -3.4024 | | 0.2239 | 3.4637 | 620 | 0.3416 | 1.1348 | 0.0 | 0.9500 | 1.1348 | -31.3410 | -18.7218 | -3.4763 | -3.4094 | | 0.2347 | 3.5754 | 640 | 0.3426 | 1.1295 | 0.0 | 0.9500 | 1.1295 | -31.4972 | -18.7749 | -3.4646 | -3.3927 | | 0.2427 | 3.6872 | 660 | 0.3427 | 1.1322 | 0.0 | 0.9750 | 1.1322 | -31.6783 | -18.7479 | -3.4703 | -3.3980 | | 0.2738 | 3.7989 | 680 | 0.3426 | 1.1336 | 0.0 | 0.9750 | 1.1336 | -31.5848 | -18.7342 | -3.4777 | -3.4049 | | 0.223 | 3.9106 | 700 | 0.3426 | 1.1363 | 0.0 | 0.9750 | 1.1363 | -31.8657 | -18.7067 | -3.4798 | -3.4061 | | 0.2073 | 4.0223 | 720 | 0.3417 | 1.1402 | 0.0 | 0.9750 | 1.1402 | -31.7798 | -18.6685 | -3.4826 | -3.4088 | | 0.2014 | 4.1341 | 740 | 0.3472 | 1.1410 | 0.0 | 0.9500 | 1.1410 | -31.9067 | -18.6602 | -3.4940 | -3.4166 | | 0.1948 | 4.2458 | 760 | 0.3518 | 1.1247 | 0.0 | 0.9500 | 1.1247 | -32.0948 | -18.8234 | -3.5043 | -3.4247 | | 0.1866 | 4.3575 | 780 | 0.3509 | 1.1220 | 0.0 | 0.9250 | 1.1220 | -32.2522 | -18.8503 | -3.5080 | -3.4275 | | 0.2203 | 4.4693 | 800 | 0.3512 | 1.1248 | 0.0 | 0.9250 | 1.1248 | -32.2119 | -18.8216 | -3.5013 | -3.4178 | | 0.2205 | 4.5810 | 820 | 0.3510 | 1.1195 | 0.0 | 0.9250 | 1.1195 | -32.1996 | -18.8746 | -3.5068 | -3.4248 | | 0.2595 | 4.6927 | 840 | 0.3507 | 1.1256 | 0.0 | 0.9250 | 1.1256 | -32.0943 | -18.8140 | -3.5082 | -3.4267 | | 0.2227 | 4.8045 | 860 | 0.3507 | 1.1249 | 0.0 | 0.9250 | 1.1249 | -32.2668 | -18.8215 | -3.5116 | -3.4283 | | 0.2003 | 4.9162 | 880 | 0.3515 | 1.1307 | 0.0 | 0.9250 | 1.1307 | -32.0981 | -18.7635 | -3.5139 | -3.4305 | | 0.1739 | 5.0279 | 900 | 0.3514 | 1.1300 | 0.0 | 0.9250 | 1.1300 | -32.2913 | -18.7695 | -3.5157 | -3.4306 | | 0.1738 | 5.1397 | 920 | 0.3549 | 1.1322 | 0.0 | 0.9250 | 1.1322 | -32.4277 | -18.7481 | -3.5290 | -3.4432 | | 0.1929 | 5.2514 | 940 | 0.3553 | 1.1277 | 0.0 | 0.9250 | 1.1277 | -32.4081 | -18.7933 | -3.5179 | -3.4289 | | 0.189 | 5.3631 | 960 | 0.3582 | 1.1324 | 0.0 | 0.9250 | 1.1324 | -32.4295 | -18.7463 | -3.5237 | -3.4345 | | 0.1813 | 5.4749 | 980 | 0.3602 | 1.1267 | 0.0 | 0.9250 | 1.1267 | -32.4544 | -18.8029 | -3.5225 | -3.4308 | | 0.2029 | 5.5866 | 1000 | 0.3599 | 1.1212 | 0.0 | 0.9250 | 1.1212 | -32.6243 | -18.8581 | -3.5247 | -3.4340 | | 0.1749 | 5.6983 | 1020 | 0.3600 | 1.1256 | 0.0 | 0.9250 | 1.1256 | -32.6542 | -18.8145 | -3.5226 | -3.4313 | | 0.2015 | 5.8101 | 1040 | 0.3602 | 1.1170 | 0.0 | 0.9000 | 1.1170 | -32.6964 | -18.9004 | -3.5208 | -3.4276 | | 0.2101 | 5.9218 | 1060 | 0.3598 | 1.1177 | 0.0 | 0.9250 | 1.1177 | -32.6506 | -18.8927 | -3.5358 | -3.4462 | | 0.1563 | 6.0335 | 1080 | 0.3603 | 1.1150 | 0.0 | 0.9000 | 1.1150 | -32.7835 | -18.9203 | -3.5304 | -3.4379 | | 0.1475 | 6.1453 | 1100 | 0.3630 | 1.1060 | 0.0 | 0.9000 | 1.1060 | -32.9071 | -19.0102 | -3.5235 | -3.4285 | | 0.1755 | 6.2570 | 1120 | 0.3642 | 1.1158 | 0.0 | 0.9000 | 1.1158 | -32.8912 | -18.9122 | -3.5306 | -3.4369 | | 0.1843 | 6.3687 | 1140 | 0.3648 | 1.1185 | 0.0 | 0.9000 | 1.1185 | -32.8595 | -18.8852 | -3.5372 | -3.4455 | | 0.1755 | 6.4804 | 1160 | 0.3657 | 1.1000 | 0.0 | 0.9000 | 1.1000 | -33.1383 | -19.0699 | -3.5363 | -3.4444 | | 0.1796 | 6.5922 | 1180 | 0.3662 | 1.1118 | 0.0 | 0.9250 | 1.1118 | -32.8604 | -18.9520 | -3.5346 | -3.4422 | | 0.1584 | 6.7039 | 1200 | 0.3657 | 1.1053 | 0.0 | 0.9000 | 1.1053 | -33.0480 | -19.0167 | -3.5364 | -3.4435 | | 0.1621 | 6.8156 | 1220 | 0.3639 | 1.1071 | 0.0 | 0.9000 | 1.1071 | -33.0234 | -18.9991 | -3.5353 | -3.4420 | | 0.2112 | 6.9274 | 1240 | 0.3662 | 1.1166 | 0.0 | 0.9250 | 1.1166 | -33.1209 | -18.9039 | -3.5362 | -3.4425 | | 0.1614 | 7.0391 | 1260 | 0.3650 | 1.1068 | 0.0 | 0.9000 | 1.1068 | -32.9925 | -19.0016 | -3.5379 | -3.4448 | | 0.1855 | 7.1508 | 1280 | 0.3670 | 1.1084 | 0.0 | 0.9000 | 1.1084 | -33.1895 | -18.9857 | -3.5339 | -3.4401 | | 0.1533 | 7.2626 | 1300 | 0.3650 | 1.1118 | 0.0 | 0.9000 | 1.1118 | -33.1868 | -18.9523 | -3.5395 | -3.4467 | | 0.1549 | 7.3743 | 1320 | 0.3651 | 1.1105 | 0.0 | 0.9250 | 1.1105 | -33.0131 | -18.9651 | -3.5402 | -3.4477 | | 0.2011 | 7.4860 | 1340 | 0.3665 | 1.1055 | 0.0 | 0.9000 | 1.1055 | -33.2247 | -19.0150 | -3.5371 | -3.4434 | | 0.2011 | 7.5978 | 1360 | 0.3658 | 1.1100 | 0.0 | 0.9000 | 1.1100 | -33.1080 | -18.9700 | -3.5330 | -3.4375 | | 0.162 | 7.7095 | 1380 | 0.3661 | 1.1120 | 0.0 | 0.9000 | 1.1120 | -33.2256 | -18.9503 | -3.5422 | -3.4496 | | 0.1768 | 7.8212 | 1400 | 0.3668 | 1.1069 | 0.0 | 0.9250 | 1.1069 | -33.1460 | -19.0009 | -3.5356 | -3.4417 | | 0.1927 | 7.9330 | 1420 | 0.3653 | 1.1076 | 0.0 | 0.9250 | 1.1076 | -33.0177 | -18.9939 | -3.5369 | -3.4431 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.5.1+cu121 - Datasets 3.5.0 - Tokenizers 0.20.3