Jayce-Ping commited on
Commit
8613f6c
·
verified ·
1 Parent(s): 4657f72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -2
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  pipeline_tag: image-text-to-text
3
  library_name: transformers
 
4
  ---
5
 
6
  # PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
@@ -25,13 +26,100 @@ PaCo-RL argues that reinforcement learning offers a promising alternative for le
25
  Extensive experiments show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability.
26
 
27
  <div align="center">
28
- <img src="https://github.com/X-GenGroup/PaCo-RL/raw/main/assets/readme_overview.png" alt="PaCo-RL Overview" width="800"/>
29
  </div>
30
 
31
- ## Quick Start
32
 
 
 
33
  For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Model Zoo
36
 
37
  The PaCo-RL framework includes several models available on Hugging Face:
 
1
  ---
2
  pipeline_tag: image-text-to-text
3
  library_name: transformers
4
+ license: apache-2.0
5
  ---
6
 
7
  # PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
 
26
  Extensive experiments show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability.
27
 
28
  <div align="center">
29
+ <img src="https://github.com/X-GenGroup/PaCo-RL/raw/main/assets/dataset_pipeline.png" alt="PaCo-RL Overview" width="800"/>
30
  </div>
31
 
 
32
 
33
+
34
+ ## Example Usage
35
  For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
36
 
37
+ ```python
38
+ import torch
39
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
40
+ from peft import PeftModel
41
+ from qwen_vl_utils import process_vision_info
42
+
43
+ # Load base model
44
+ base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
45
+ "Qwen/Qwen2.5-VL-7B-Instruct",
46
+ torch_dtype=torch.bfloat16,
47
+ device_map="auto"
48
+ )
49
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
50
+
51
+ # Load LoRA adapter
52
+ model = PeftModel.from_pretrained(
53
+ base_model,
54
+ "X-GenGroup/PaCo-Reward-7B-Lora"
55
+ )
56
+
57
+ image1 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_1.jpg'
58
+ image2 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_2.jpg'
59
+
60
+ main_prompt = 'Generate multiple images portraying a medical scene of a dentist in scrubs. The images should include activities such as explaining oral hygiene to a patient, taking X-rays of teeth, cleaning teeth in a dental office, and filling a cavity during an appointment. The setting should depict a realistic dental clinic.'
61
+ text_prompt = (
62
+ f"Given two subfigures generated based on the theme: \"{main_prompt}\", "
63
+ f"do the two images maintain consistency in terms of style, logic and identity? "
64
+ f"Answer \"Yes\" and \"No\" first, and then provide detailed reasons."
65
+ )
66
+
67
+ # Example: Compare whether two images are visually consistent
68
+ messages_1 = [
69
+ {
70
+ "role": "user",
71
+ "content": [
72
+ {"type": "image", "image": image1},
73
+ {"type": "image", "image": image2},
74
+ {"type": "text", "text": text_prompt},
75
+ ],
76
+ }
77
+ ]
78
+
79
+ # Preparation for inference
80
+ text = processor.apply_chat_template(
81
+ messages_1, tokenize=False, add_generation_prompt=True
82
+ )
83
+ image_inputs, video_inputs = process_vision_info(messages_1)
84
+ inputs = processor(
85
+ text=[text],
86
+ images=image_inputs,
87
+ videos=video_inputs,
88
+ padding=True,
89
+ return_tensors="pt",
90
+ )
91
+ inputs = inputs.to("cuda")
92
+
93
+ # Inference: Calculate consistency score
94
+ # Get logits for first token
95
+ with torch.no_grad():
96
+ outputs = model(**inputs)
97
+ first_token_logits = outputs.logits[0, -1, :] # Last position of prompt
98
+
99
+ # Get token IDs for "Yes" and "No"
100
+ yes_id = processor.tokenizer.encode("Yes", add_special_tokens=False)[0]
101
+ no_id = processor.tokenizer.encode("No", add_special_tokens=False)[0]
102
+
103
+ # Calculate probability
104
+ yes_logit = first_token_logits[yes_id]
105
+ no_logit = first_token_logits[no_id]
106
+ yes_prob = torch.exp(yes_logit) / (torch.exp(yes_logit) + torch.exp(no_logit))
107
+
108
+ # PaCo-Reward-7B and this model may differ in scores due to numerical precision
109
+ print(f"Consistency Score (Yes Conditional Probability): {yes_prob.item():.4f}")
110
+
111
+ # Inference: Generate detailed reasons
112
+ generated_ids = model.generate(**inputs, max_new_tokens=512)
113
+ generated_ids_trimmed = [
114
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
115
+ ]
116
+ output_text = processor.batch_decode(
117
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
118
+ )
119
+ print(output_text[0])
120
+ ```
121
+
122
+
123
  ## Model Zoo
124
 
125
  The PaCo-RL framework includes several models available on Hugging Face: