dpo_40k_abla_all_eight

This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the dpo_ablation_all_eight dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5260
  • Rewards/chosen: -0.5688
  • Rewards/rejected: -1.2177
  • Rewards/accuracies: 0.7250
  • Rewards/margins: 0.6490
  • Logps/chosen: -37.6179
  • Logps/rejected: -48.8116
  • Logits/chosen: 0.2010
  • Logits/rejected: 0.1913

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6888 0.0806 50 0.6891 -0.0097 -0.0183 0.5800 0.0086 -32.0277 -36.8175 0.4651 0.4536
0.6637 0.1612 100 0.6643 -0.1036 -0.1689 0.6900 0.0653 -32.9663 -38.3233 0.4729 0.4553
0.6275 0.2418 150 0.6287 -0.2312 -0.3882 0.6650 0.1570 -34.2426 -40.5160 0.4366 0.4244
0.5805 0.3225 200 0.5982 -0.3261 -0.5805 0.7050 0.2545 -35.1910 -42.4393 0.3993 0.3911
0.5132 0.4031 250 0.5752 -0.3879 -0.7462 0.7050 0.3583 -35.8094 -44.0962 0.3547 0.3346
0.5218 0.4837 300 0.5598 -0.3934 -0.8424 0.7250 0.4490 -35.8645 -45.0584 0.3146 0.2951
0.449 0.5643 350 0.5505 -0.4804 -1.0050 0.7250 0.5246 -36.7344 -46.6842 0.2733 0.2540
0.4075 0.6449 400 0.5391 -0.4772 -1.0612 0.7150 0.5840 -36.7021 -47.2460 0.2404 0.2324
0.5689 0.7255 450 0.5325 -0.5299 -1.1545 0.7150 0.6246 -37.2289 -48.1790 0.2281 0.2155
0.4456 0.8061 500 0.5280 -0.5577 -1.1977 0.7200 0.6400 -37.5073 -48.6110 0.2110 0.1989
0.5101 0.8867 550 0.5262 -0.5693 -1.2199 0.7300 0.6505 -37.6238 -48.8327 0.2072 0.1892
0.4293 0.9674 600 0.5246 -0.5665 -1.2212 0.7300 0.6547 -37.5951 -48.8461 0.2064 0.1921

Framework versions

  • PEFT 0.17.1
  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.0
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for xiaorui638/qwen2_5vl7b-dpo_40k_abla_all_eight-lora

Adapter
(133)
this model