dpo_40k_abla_all_eight
This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the dpo_ablation_all_eight dataset. It achieves the following results on the evaluation set:
- Loss: 0.5260
- Rewards/chosen: -0.5688
- Rewards/rejected: -1.2177
- Rewards/accuracies: 0.7250
- Rewards/margins: 0.6490
- Logps/chosen: -37.6179
- Logps/rejected: -48.8116
- Logits/chosen: 0.2010
- Logits/rejected: 0.1913
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6888 | 0.0806 | 50 | 0.6891 | -0.0097 | -0.0183 | 0.5800 | 0.0086 | -32.0277 | -36.8175 | 0.4651 | 0.4536 |
| 0.6637 | 0.1612 | 100 | 0.6643 | -0.1036 | -0.1689 | 0.6900 | 0.0653 | -32.9663 | -38.3233 | 0.4729 | 0.4553 |
| 0.6275 | 0.2418 | 150 | 0.6287 | -0.2312 | -0.3882 | 0.6650 | 0.1570 | -34.2426 | -40.5160 | 0.4366 | 0.4244 |
| 0.5805 | 0.3225 | 200 | 0.5982 | -0.3261 | -0.5805 | 0.7050 | 0.2545 | -35.1910 | -42.4393 | 0.3993 | 0.3911 |
| 0.5132 | 0.4031 | 250 | 0.5752 | -0.3879 | -0.7462 | 0.7050 | 0.3583 | -35.8094 | -44.0962 | 0.3547 | 0.3346 |
| 0.5218 | 0.4837 | 300 | 0.5598 | -0.3934 | -0.8424 | 0.7250 | 0.4490 | -35.8645 | -45.0584 | 0.3146 | 0.2951 |
| 0.449 | 0.5643 | 350 | 0.5505 | -0.4804 | -1.0050 | 0.7250 | 0.5246 | -36.7344 | -46.6842 | 0.2733 | 0.2540 |
| 0.4075 | 0.6449 | 400 | 0.5391 | -0.4772 | -1.0612 | 0.7150 | 0.5840 | -36.7021 | -47.2460 | 0.2404 | 0.2324 |
| 0.5689 | 0.7255 | 450 | 0.5325 | -0.5299 | -1.1545 | 0.7150 | 0.6246 | -37.2289 | -48.1790 | 0.2281 | 0.2155 |
| 0.4456 | 0.8061 | 500 | 0.5280 | -0.5577 | -1.1977 | 0.7200 | 0.6400 | -37.5073 | -48.6110 | 0.2110 | 0.1989 |
| 0.5101 | 0.8867 | 550 | 0.5262 | -0.5693 | -1.2199 | 0.7300 | 0.6505 | -37.6238 | -48.8327 | 0.2072 | 0.1892 |
| 0.4293 | 0.9674 | 600 | 0.5246 | -0.5665 | -1.2212 | 0.7300 | 0.6547 | -37.5951 | -48.8461 | 0.2064 | 0.1921 |
Framework versions
- PEFT 0.17.1
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 4.0.0
- Tokenizers 0.21.0
- Downloads last month
- 4
Model tree for xiaorui638/qwen2_5vl7b-dpo_40k_abla_all_eight-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct