YuchenLi01 commited on
Commit
6c835a6
·
verified ·
1 Parent(s): 0512017

Model save

Browse files
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-05_beta0.1_epoch8.0_42
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-05_beta0.1_epoch8.0_42
16
+
17
+ This model was trained from scratch on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 2.2658
20
+ - Rewards/chosen: -1.3596
21
+ - Rewards/rejected: 0.0
22
+ - Rewards/accuracies: 0.1750
23
+ - Rewards/margins: -1.3596
24
+ - Logps/rejected: -69.9729
25
+ - Logps/chosen: -43.6664
26
+ - Logits/rejected: -3.6490
27
+ - Logits/chosen: -3.5141
28
+
29
+ ## Model description
30
+
31
+ More information needed
32
+
33
+ ## Intended uses & limitations
34
+
35
+ More information needed
36
+
37
+ ## Training and evaluation data
38
+
39
+ More information needed
40
+
41
+ ## Training procedure
42
+
43
+ ### Training hyperparameters
44
+
45
+ The following hyperparameters were used during training:
46
+ - learning_rate: 1e-05
47
+ - train_batch_size: 4
48
+ - eval_batch_size: 4
49
+ - seed: 42
50
+ - distributed_type: multi-GPU
51
+ - num_devices: 8
52
+ - total_train_batch_size: 32
53
+ - total_eval_batch_size: 32
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 8.0
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.5466 | 0.1117 | 20 | 0.5253 | 0.3756 | 0.0 | 1.0 | 0.3756 | -37.5665 | -26.3140 | -2.5322 | -2.6369 |
64
+ | 0.4059 | 0.2235 | 40 | 0.4084 | 0.9166 | 0.0 | 0.9000 | 0.9166 | -33.2775 | -20.9040 | -3.3833 | -3.3405 |
65
+ | 0.4138 | 0.3352 | 60 | 0.3843 | 0.9750 | 0.0 | 1.0 | 0.9750 | -32.0425 | -20.3199 | -3.3283 | -3.3019 |
66
+ | 0.3939 | 0.4469 | 80 | 0.4022 | 0.8901 | 0.0 | 0.9500 | 0.8901 | -32.4997 | -21.1695 | -3.3332 | -3.3071 |
67
+ | 0.4055 | 0.5587 | 100 | 0.4635 | 0.7532 | 0.0 | 0.875 | 0.7532 | -34.6906 | -22.5383 | -3.3361 | -3.3059 |
68
+ | 0.4772 | 0.6704 | 120 | 0.5260 | 0.6294 | 0.0 | 0.875 | 0.6294 | -36.9209 | -23.7756 | -3.2411 | -3.2326 |
69
+ | 0.5798 | 0.7821 | 140 | 0.6274 | 0.3853 | 0.0 | 0.7750 | 0.3853 | -38.5313 | -26.2172 | -3.0298 | -3.0459 |
70
+ | 0.8527 | 0.8939 | 160 | 0.7349 | 0.2142 | 0.0 | 0.6500 | 0.2142 | -40.1633 | -27.9280 | -2.9920 | -3.0142 |
71
+ | 0.3929 | 1.0056 | 180 | 0.7815 | 0.1006 | 0.0 | 0.6000 | 0.1006 | -41.8164 | -29.0638 | -3.0261 | -3.0338 |
72
+ | 0.3188 | 1.1173 | 200 | 0.9600 | -0.1511 | 0.0 | 0.375 | -0.1511 | -43.9392 | -31.5815 | -3.1820 | -3.1691 |
73
+ | 0.5336 | 1.2291 | 220 | 0.9597 | -0.1903 | 0.0 | 0.4750 | -0.1903 | -45.7020 | -31.9727 | -3.0984 | -3.0886 |
74
+ | 0.4445 | 1.3408 | 240 | 1.0146 | -0.1911 | 0.0 | 0.4250 | -0.1911 | -46.0488 | -31.9815 | -3.2781 | -3.2460 |
75
+ | 0.6491 | 1.4525 | 260 | 0.9691 | -0.0833 | 0.0 | 0.5 | -0.0833 | -44.8210 | -30.9032 | -3.1003 | -3.0880 |
76
+ | 0.513 | 1.5642 | 280 | 0.9466 | -0.0891 | 0.0 | 0.4250 | -0.0891 | -45.4357 | -30.9608 | -3.0974 | -3.0767 |
77
+ | 0.4424 | 1.6760 | 300 | 0.9961 | -0.2216 | 0.0 | 0.3250 | -0.2216 | -45.7439 | -32.2863 | -3.1617 | -3.1317 |
78
+ | 0.5232 | 1.7877 | 320 | 0.9826 | -0.1976 | 0.0 | 0.3500 | -0.1976 | -45.1696 | -32.0464 | -3.0142 | -3.0153 |
79
+ | 0.5069 | 1.8994 | 340 | 0.9945 | -0.0783 | 0.0 | 0.4500 | -0.0783 | -46.3133 | -30.8527 | -3.1964 | -3.1637 |
80
+ | 0.2684 | 2.0112 | 360 | 1.0036 | -0.0981 | 0.0 | 0.5 | -0.0981 | -45.3146 | -31.0515 | -3.0113 | -3.0037 |
81
+ | 0.3519 | 2.1229 | 380 | 1.1985 | -0.3804 | 0.0 | 0.3500 | -0.3804 | -48.5596 | -33.8738 | -3.1878 | -3.1420 |
82
+ | 0.3155 | 2.2346 | 400 | 1.1909 | -0.3289 | 0.0 | 0.3250 | -0.3289 | -48.8887 | -33.3588 | -3.1928 | -3.1583 |
83
+ | 0.3517 | 2.3464 | 420 | 1.2317 | -0.4161 | 0.0 | 0.3250 | -0.4161 | -51.7570 | -34.2306 | -3.1170 | -3.0905 |
84
+ | 0.2643 | 2.4581 | 440 | 1.2229 | -0.4421 | 0.0 | 0.3000 | -0.4421 | -49.8322 | -34.4914 | -3.0995 | -3.0788 |
85
+ | 0.2984 | 2.5698 | 460 | 1.1842 | -0.4098 | 0.0 | 0.3250 | -0.4098 | -47.7830 | -34.1680 | -2.9851 | -2.9781 |
86
+ | 0.2776 | 2.6816 | 480 | 1.2433 | -0.4190 | 0.0 | 0.3250 | -0.4190 | -48.9492 | -34.2604 | -3.0807 | -3.0566 |
87
+ | 0.227 | 2.7933 | 500 | 1.2486 | -0.3790 | 0.0 | 0.3250 | -0.3790 | -49.0447 | -33.8596 | -3.0917 | -3.0775 |
88
+ | 0.3012 | 2.9050 | 520 | 1.1795 | -0.2955 | 0.0 | 0.3500 | -0.2955 | -49.5190 | -33.0246 | -3.0727 | -3.0613 |
89
+ | 0.2164 | 3.0168 | 540 | 1.2305 | -0.4498 | 0.0 | 0.3500 | -0.4498 | -48.9411 | -34.5676 | -3.1648 | -3.1365 |
90
+ | 0.2851 | 3.1285 | 560 | 1.4187 | -0.5654 | 0.0 | 0.2250 | -0.5654 | -52.5012 | -35.7240 | -3.3471 | -3.2835 |
91
+ | 0.2124 | 3.2402 | 580 | 1.3477 | -0.5484 | 0.0 | 0.2250 | -0.5484 | -52.8059 | -35.5544 | -3.2139 | -3.1734 |
92
+ | 0.2053 | 3.3520 | 600 | 1.4134 | -0.6614 | 0.0 | 0.2250 | -0.6614 | -53.6831 | -36.6840 | -3.2238 | -3.1798 |
93
+ | 0.3601 | 3.4637 | 620 | 1.4628 | -0.7559 | 0.0 | 0.2250 | -0.7559 | -54.5884 | -37.6292 | -3.2911 | -3.2421 |
94
+ | 0.1913 | 3.5754 | 640 | 1.4378 | -0.6411 | 0.0 | 0.2250 | -0.6411 | -53.7149 | -36.4812 | -3.2762 | -3.2227 |
95
+ | 0.2378 | 3.6872 | 660 | 1.4228 | -0.7470 | 0.0 | 0.2000 | -0.7470 | -54.1003 | -37.5405 | -3.2093 | -3.1707 |
96
+ | 0.2627 | 3.7989 | 680 | 1.4170 | -0.5635 | 0.0 | 0.2250 | -0.5635 | -53.7126 | -35.7054 | -3.2556 | -3.2116 |
97
+ | 0.1742 | 3.9106 | 700 | 1.4822 | -0.6458 | 0.0 | 0.1500 | -0.6458 | -54.2694 | -36.5282 | -3.2417 | -3.1927 |
98
+ | 0.1349 | 4.0223 | 720 | 1.4830 | -0.6865 | 0.0 | 0.2000 | -0.6865 | -55.6546 | -36.9353 | -3.3105 | -3.2496 |
99
+ | 0.1227 | 4.1341 | 740 | 1.6660 | -0.9048 | 0.0 | 0.1500 | -0.9048 | -58.4858 | -39.1180 | -3.4280 | -3.3460 |
100
+ | 0.1293 | 4.2458 | 760 | 1.5319 | -0.7418 | 0.0 | 0.2000 | -0.7418 | -55.8039 | -37.4880 | -3.3607 | -3.2969 |
101
+ | 0.1172 | 4.3575 | 780 | 1.5251 | -0.7269 | 0.0 | 0.2250 | -0.7269 | -55.8217 | -37.3393 | -3.3338 | -3.2761 |
102
+ | 0.1683 | 4.4693 | 800 | 1.5403 | -0.7640 | 0.0 | 0.1500 | -0.7640 | -55.9445 | -37.7102 | -3.3597 | -3.2995 |
103
+ | 0.1904 | 4.5810 | 820 | 1.5914 | -0.8601 | 0.0 | 0.1750 | -0.8601 | -56.9242 | -38.6707 | -3.3721 | -3.3039 |
104
+ | 0.2634 | 4.6927 | 840 | 1.4943 | -0.7421 | 0.0 | 0.2000 | -0.7421 | -55.4652 | -37.4915 | -3.3267 | -3.2725 |
105
+ | 0.1922 | 4.8045 | 860 | 1.4566 | -0.7606 | 0.0 | 0.125 | -0.7606 | -55.5023 | -37.6765 | -3.3876 | -3.3162 |
106
+ | 0.1825 | 4.9162 | 880 | 1.4759 | -0.7833 | 0.0 | 0.125 | -0.7833 | -54.1260 | -37.9035 | -3.3873 | -3.3143 |
107
+ | 0.1293 | 5.0279 | 900 | 1.4924 | -0.7656 | 0.0 | 0.1500 | -0.7656 | -55.2890 | -37.7256 | -3.4025 | -3.3230 |
108
+ | 0.0969 | 5.1397 | 920 | 1.8492 | -1.0069 | 0.0 | 0.125 | -1.0069 | -61.5237 | -40.1392 | -3.4959 | -3.4000 |
109
+ | 0.1006 | 5.2514 | 940 | 1.6623 | -0.8035 | 0.0 | 0.1500 | -0.8035 | -58.7658 | -38.1047 | -3.4757 | -3.3889 |
110
+ | 0.1129 | 5.3631 | 960 | 1.6896 | -0.8444 | 0.0 | 0.1750 | -0.8444 | -59.1995 | -38.5144 | -3.5036 | -3.4093 |
111
+ | 0.1248 | 5.4749 | 980 | 1.9180 | -1.1702 | 0.0 | 0.1000 | -1.1702 | -62.2443 | -41.7719 | -3.5884 | -3.4733 |
112
+ | 0.1063 | 5.5866 | 1000 | 1.7638 | -0.9265 | 0.0 | 0.1500 | -0.9265 | -59.8830 | -39.3352 | -3.5476 | -3.4446 |
113
+ | 0.1929 | 5.6983 | 1020 | 1.7843 | -0.9282 | 0.0 | 0.125 | -0.9282 | -61.1666 | -39.3518 | -3.5739 | -3.4656 |
114
+ | 0.1503 | 5.8101 | 1040 | 1.6424 | -0.8562 | 0.0 | 0.125 | -0.8562 | -59.2886 | -38.6320 | -3.5519 | -3.4453 |
115
+ | 0.137 | 5.9218 | 1060 | 1.6859 | -0.7688 | 0.0 | 0.1500 | -0.7688 | -60.2979 | -37.7578 | -3.5396 | -3.4304 |
116
+ | 0.0841 | 6.0335 | 1080 | 1.7235 | -0.8587 | 0.0 | 0.1750 | -0.8587 | -61.1482 | -38.6569 | -3.5664 | -3.4532 |
117
+ | 0.0798 | 6.1453 | 1100 | 2.1241 | -1.2464 | 0.0 | 0.1500 | -1.2464 | -67.2607 | -42.5339 | -3.6434 | -3.5114 |
118
+ | 0.0996 | 6.2570 | 1120 | 2.1727 | -1.3262 | 0.0 | 0.1000 | -1.3262 | -68.6685 | -43.3320 | -3.6476 | -3.5174 |
119
+ | 0.114 | 6.3687 | 1140 | 2.1072 | -1.2928 | 0.0 | 0.1000 | -1.2928 | -67.2377 | -42.9979 | -3.6253 | -3.4997 |
120
+ | 0.0937 | 6.4804 | 1160 | 2.0897 | -1.3032 | 0.0 | 0.1000 | -1.3032 | -67.8363 | -43.1023 | -3.6172 | -3.4898 |
121
+ | 0.0977 | 6.5922 | 1180 | 2.1033 | -1.2756 | 0.0 | 0.125 | -1.2756 | -67.8044 | -42.8260 | -3.6423 | -3.5153 |
122
+ | 0.0816 | 6.7039 | 1200 | 2.0782 | -1.2335 | 0.0 | 0.125 | -1.2335 | -66.4155 | -42.4054 | -3.6321 | -3.5036 |
123
+ | 0.0821 | 6.8156 | 1220 | 2.0229 | -1.1573 | 0.0 | 0.1750 | -1.1573 | -66.1469 | -41.6429 | -3.6116 | -3.4854 |
124
+ | 0.14 | 6.9274 | 1240 | 2.0659 | -1.1567 | 0.0 | 0.1500 | -1.1567 | -66.6871 | -41.6369 | -3.6213 | -3.4943 |
125
+ | 0.0874 | 7.0391 | 1260 | 2.0813 | -1.1806 | 0.0 | 0.1750 | -1.1806 | -67.1844 | -41.8758 | -3.6316 | -3.5036 |
126
+ | 0.1099 | 7.1508 | 1280 | 2.1351 | -1.2467 | 0.0 | 0.1500 | -1.2467 | -67.7858 | -42.5375 | -3.6372 | -3.5070 |
127
+ | 0.0759 | 7.2626 | 1300 | 2.1856 | -1.3011 | 0.0 | 0.1500 | -1.3011 | -68.6701 | -43.0811 | -3.6452 | -3.5134 |
128
+ | 0.0823 | 7.3743 | 1320 | 2.2192 | -1.3394 | 0.0 | 0.1500 | -1.3394 | -69.4521 | -43.4637 | -3.6450 | -3.5105 |
129
+ | 0.1278 | 7.4860 | 1340 | 2.2384 | -1.3569 | 0.0 | 0.1500 | -1.3569 | -69.7412 | -43.6389 | -3.6481 | -3.5135 |
130
+ | 0.1139 | 7.5978 | 1360 | 2.2508 | -1.3602 | 0.0 | 0.1500 | -1.3602 | -69.8968 | -43.6718 | -3.6591 | -3.5274 |
131
+ | 0.0792 | 7.7095 | 1380 | 2.2559 | -1.3643 | 0.0 | 0.1500 | -1.3643 | -69.9685 | -43.7135 | -3.6531 | -3.5199 |
132
+ | 0.1072 | 7.8212 | 1400 | 2.2601 | -1.3847 | 0.0 | 0.1500 | -1.3847 | -69.9992 | -43.9171 | -3.6523 | -3.5175 |
133
+ | 0.1035 | 7.9330 | 1420 | 2.2658 | -1.3596 | 0.0 | 0.1750 | -1.3596 | -69.9729 | -43.6664 | -3.6490 | -3.5141 |
134
+
135
+
136
+ ### Framework versions
137
+
138
+ - Transformers 4.45.2
139
+ - Pytorch 2.5.1+cu121
140
+ - Datasets 3.5.0
141
+ - Tokenizers 0.20.3
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.2510837823277412,
5
+ "train_runtime": 6583.6658,
6
+ "train_samples": 5700,
7
+ "train_samples_per_second": 6.926,
8
+ "train_steps_per_second": 0.218
9
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.1,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.45.2"
14
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.2510837823277412,
5
+ "train_runtime": 6583.6658,
6
+ "train_samples": 5700,
7
+ "train_samples_per_second": 6.926,
8
+ "train_steps_per_second": 0.218
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff