genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-07_beta0.1_epoch8.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3658
  • Rewards/chosen: 1.1075
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.9000
  • Rewards/margins: 1.1075
  • Logps/rejected: -33.0278
  • Logps/chosen: -18.9948
  • Logits/rejected: -3.5375
  • Logits/chosen: -3.4431

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 8.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6933 0.1117 20 0.6927 -0.0050 0.0 0.4250 -0.0050 -41.6216 -30.1197 -2.2278 -2.3742
0.6769 0.2235 40 0.6772 0.0340 0.0 0.8000 0.0340 -41.3389 -29.7304 -2.2451 -2.3885
0.6325 0.3352 60 0.6327 0.1201 0.0 0.9750 0.1201 -40.3433 -28.8695 -2.3126 -2.4487
0.552 0.4469 80 0.5472 0.3232 0.0 1.0 0.3232 -38.1920 -26.8384 -2.4785 -2.5917
0.4166 0.5587 100 0.4622 0.5449 0.0 1.0 0.5449 -35.6187 -24.6215 -2.7023 -2.7853
0.4124 0.6704 120 0.4133 0.7154 0.0 0.9750 0.7154 -34.0244 -22.9157 -2.9427 -2.9959
0.4072 0.7821 140 0.3868 0.8328 0.0 0.9750 0.8328 -33.0483 -21.7421 -3.0564 -3.0879
0.3682 0.8939 160 0.3694 0.9289 0.0 0.9750 0.9289 -32.4181 -20.7810 -3.1688 -3.1797
0.3347 1.0056 180 0.3606 0.9708 0.0 0.9750 0.9708 -32.0167 -20.3620 -3.2109 -3.2133
0.3428 1.1173 200 0.3570 0.9820 0.0 0.9750 0.9820 -31.8110 -20.2500 -3.2368 -3.2277
0.3198 1.2291 220 0.3539 0.9979 0.0 0.9750 0.9979 -31.5782 -20.0911 -3.2694 -3.2536
0.3279 1.3408 240 0.3513 1.0271 0.0 0.9750 1.0271 -31.4944 -19.7987 -3.2925 -3.2709
0.3299 1.4525 260 0.3501 1.0274 0.0 0.9750 1.0274 -31.5820 -19.7956 -3.3043 -3.2821
0.3102 1.5642 280 0.3456 1.0467 0.0 0.9750 1.0467 -31.2429 -19.6030 -3.3094 -3.2909
0.3649 1.6760 300 0.3460 1.0501 0.0 0.9750 1.0501 -31.2992 -19.5688 -3.3220 -3.2938
0.3392 1.7877 320 0.3441 1.0555 0.0 0.9750 1.0555 -31.2234 -19.5146 -3.3241 -3.2951
0.3424 1.8994 340 0.3420 1.0586 0.0 0.9750 1.0586 -31.1263 -19.4839 -3.3264 -3.2999
0.2988 2.0112 360 0.3409 1.0713 0.0 0.9750 1.0713 -30.9231 -19.3570 -3.3451 -3.3142
0.2874 2.1229 380 0.3425 1.0877 0.0 0.9750 1.0877 -30.9979 -19.1933 -3.3711 -3.3294
0.2657 2.2346 400 0.3420 1.0972 0.0 0.9750 1.0972 -31.0448 -19.0985 -3.3937 -3.3495
0.306 2.3464 420 0.3429 1.1068 0.0 0.9750 1.1068 -31.1971 -19.0020 -3.4022 -3.3514
0.2743 2.4581 440 0.3409 1.1059 0.0 0.9750 1.1059 -31.2159 -19.0107 -3.3961 -3.3463
0.2916 2.5698 460 0.3394 1.1093 0.0 0.9750 1.1093 -31.0894 -18.9768 -3.4099 -3.3627
0.2729 2.6816 480 0.3397 1.1164 0.0 0.9750 1.1164 -31.1564 -18.9062 -3.4079 -3.3561
0.2424 2.7933 500 0.3375 1.1261 0.0 0.9750 1.1261 -31.1838 -18.8088 -3.4113 -3.3589
0.2642 2.9050 520 0.3409 1.1068 0.0 0.9750 1.1068 -31.2621 -19.0017 -3.4265 -3.3712
0.2717 3.0168 540 0.3375 1.1340 0.0 0.9750 1.1340 -31.1489 -18.7302 -3.4243 -3.3692
0.2608 3.1285 560 0.3406 1.1287 0.0 0.9750 1.1287 -31.4594 -18.7828 -3.4436 -3.3809
0.2332 3.2402 580 0.3423 1.1310 0.0 0.9500 1.1310 -31.4343 -18.7603 -3.4628 -3.3948
0.2332 3.3520 600 0.3404 1.1220 0.0 0.9500 1.1220 -31.3091 -18.8498 -3.4676 -3.4024
0.2239 3.4637 620 0.3416 1.1348 0.0 0.9500 1.1348 -31.3410 -18.7218 -3.4763 -3.4094
0.2347 3.5754 640 0.3426 1.1295 0.0 0.9500 1.1295 -31.4972 -18.7749 -3.4646 -3.3927
0.2427 3.6872 660 0.3427 1.1322 0.0 0.9750 1.1322 -31.6783 -18.7479 -3.4703 -3.3980
0.2738 3.7989 680 0.3426 1.1336 0.0 0.9750 1.1336 -31.5848 -18.7342 -3.4777 -3.4049
0.223 3.9106 700 0.3426 1.1363 0.0 0.9750 1.1363 -31.8657 -18.7067 -3.4798 -3.4061
0.2073 4.0223 720 0.3417 1.1402 0.0 0.9750 1.1402 -31.7798 -18.6685 -3.4826 -3.4088
0.2014 4.1341 740 0.3472 1.1410 0.0 0.9500 1.1410 -31.9067 -18.6602 -3.4940 -3.4166
0.1948 4.2458 760 0.3518 1.1247 0.0 0.9500 1.1247 -32.0948 -18.8234 -3.5043 -3.4247
0.1866 4.3575 780 0.3509 1.1220 0.0 0.9250 1.1220 -32.2522 -18.8503 -3.5080 -3.4275
0.2203 4.4693 800 0.3512 1.1248 0.0 0.9250 1.1248 -32.2119 -18.8216 -3.5013 -3.4178
0.2205 4.5810 820 0.3510 1.1195 0.0 0.9250 1.1195 -32.1996 -18.8746 -3.5068 -3.4248
0.2595 4.6927 840 0.3507 1.1256 0.0 0.9250 1.1256 -32.0943 -18.8140 -3.5082 -3.4267
0.2227 4.8045 860 0.3507 1.1249 0.0 0.9250 1.1249 -32.2668 -18.8215 -3.5116 -3.4283
0.2003 4.9162 880 0.3515 1.1307 0.0 0.9250 1.1307 -32.0981 -18.7635 -3.5139 -3.4305
0.1739 5.0279 900 0.3514 1.1300 0.0 0.9250 1.1300 -32.2913 -18.7695 -3.5157 -3.4306
0.1738 5.1397 920 0.3549 1.1322 0.0 0.9250 1.1322 -32.4277 -18.7481 -3.5290 -3.4432
0.1929 5.2514 940 0.3553 1.1277 0.0 0.9250 1.1277 -32.4081 -18.7933 -3.5179 -3.4289
0.189 5.3631 960 0.3582 1.1324 0.0 0.9250 1.1324 -32.4295 -18.7463 -3.5237 -3.4345
0.1813 5.4749 980 0.3602 1.1267 0.0 0.9250 1.1267 -32.4544 -18.8029 -3.5225 -3.4308
0.2029 5.5866 1000 0.3599 1.1212 0.0 0.9250 1.1212 -32.6243 -18.8581 -3.5247 -3.4340
0.1749 5.6983 1020 0.3600 1.1256 0.0 0.9250 1.1256 -32.6542 -18.8145 -3.5226 -3.4313
0.2015 5.8101 1040 0.3602 1.1170 0.0 0.9000 1.1170 -32.6964 -18.9004 -3.5208 -3.4276
0.2101 5.9218 1060 0.3598 1.1177 0.0 0.9250 1.1177 -32.6506 -18.8927 -3.5358 -3.4462
0.1563 6.0335 1080 0.3603 1.1150 0.0 0.9000 1.1150 -32.7835 -18.9203 -3.5304 -3.4379
0.1475 6.1453 1100 0.3630 1.1060 0.0 0.9000 1.1060 -32.9071 -19.0102 -3.5235 -3.4285
0.1755 6.2570 1120 0.3642 1.1158 0.0 0.9000 1.1158 -32.8912 -18.9122 -3.5306 -3.4369
0.1843 6.3687 1140 0.3648 1.1185 0.0 0.9000 1.1185 -32.8595 -18.8852 -3.5372 -3.4455
0.1755 6.4804 1160 0.3657 1.1000 0.0 0.9000 1.1000 -33.1383 -19.0699 -3.5363 -3.4444
0.1796 6.5922 1180 0.3662 1.1118 0.0 0.9250 1.1118 -32.8604 -18.9520 -3.5346 -3.4422
0.1584 6.7039 1200 0.3657 1.1053 0.0 0.9000 1.1053 -33.0480 -19.0167 -3.5364 -3.4435
0.1621 6.8156 1220 0.3639 1.1071 0.0 0.9000 1.1071 -33.0234 -18.9991 -3.5353 -3.4420
0.2112 6.9274 1240 0.3662 1.1166 0.0 0.9250 1.1166 -33.1209 -18.9039 -3.5362 -3.4425
0.1614 7.0391 1260 0.3650 1.1068 0.0 0.9000 1.1068 -32.9925 -19.0016 -3.5379 -3.4448
0.1855 7.1508 1280 0.3670 1.1084 0.0 0.9000 1.1084 -33.1895 -18.9857 -3.5339 -3.4401
0.1533 7.2626 1300 0.3650 1.1118 0.0 0.9000 1.1118 -33.1868 -18.9523 -3.5395 -3.4467
0.1549 7.3743 1320 0.3651 1.1105 0.0 0.9250 1.1105 -33.0131 -18.9651 -3.5402 -3.4477
0.2011 7.4860 1340 0.3665 1.1055 0.0 0.9000 1.1055 -33.2247 -19.0150 -3.5371 -3.4434
0.2011 7.5978 1360 0.3658 1.1100 0.0 0.9000 1.1100 -33.1080 -18.9700 -3.5330 -3.4375
0.162 7.7095 1380 0.3661 1.1120 0.0 0.9000 1.1120 -33.2256 -18.9503 -3.5422 -3.4496
0.1768 7.8212 1400 0.3668 1.1069 0.0 0.9250 1.1069 -33.1460 -19.0009 -3.5356 -3.4417
0.1927 7.9330 1420 0.3653 1.1076 0.0 0.9250 1.1076 -33.0177 -18.9939 -3.5369 -3.4431

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
3
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr5e-07_beta0.1_epoch8.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1247)
this model