File size: 7,024 Bytes
c3df7d4
 
 
fcd0aca
c3df7d4
 
 
 
8c52d86
 
 
 
 
 
 
c3df7d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cfaa9b9
 
 
 
fcd0aca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c3df7d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
---

# PaCo-Reward-7B: A Pairwise Consistency Evaluator from the PaCo-RL Framework

<div align="center">
  <a href='https://arxiv.org/abs/2512.04784'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a>  &nbsp;
  <a href='https://x-gengroup.github.io/HomePage_PaCo-RL/'><img src='https://img.shields.io/badge/ProjectPage-purple?logo=github'></a> &nbsp;
  <a href="https://github.com/X-GenGroup/PaCo-RL"><img src="https://img.shields.io/badge/Code-9E95B7?logo=github"></a> &nbsp; 
  <a href='https://huggingface.co/collections/X-GenGroup/paco-rl'><img src='https://img.shields.io/badge/Data & Model-green?logo=huggingface'></a> &nbsp;
</div>

This repository contains **PaCo-Reward-7B**, a key component of the **PaCo-RL** framework, as presented in the paper:
[**PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling**](https://huggingface.co/papers/2512.04784)

The **PaCo-RL** framework is designed for consistent image generation through reinforcement learning, aiming to preserve identities, styles, and logical coherence across multiple images for applications like storytelling and character design. **PaCo-Reward-7B** specifically acts as a pairwise consistency evaluator. It is trained on a large-scale dataset constructed via automated sub-figure pairing and evaluates consistency through a generative, autoregressive scoring mechanism, enhanced by task-aware instructions and Chain-of-Thought (CoT) reasoning.

-   **Project Page:** https://x-gengroup.github.io/HomePage_PaCo-RL/
-   **Code Repository:** https://github.com/X-GenGroup/PaCo-RL

## 🌟 Overview

**PaCo-RL** is a comprehensive framework for consistent image generation through reinforcement learning, addressing challenges in preserving identities, styles, and logical coherence across multiple images for storytelling and character design applications.

### Key Components

-   **PaCo-Reward**: A pairwise consistency evaluator with task-aware instruction and CoT reasoning.
-   **PaCo-GRPO**: Efficient RL optimization with resolution-decoupled training and log-tamed multi-reward aggregation

<div align="center">
  <img src="https://github.com/X-GenGroup/PaCo-RL/raw/main/assets/dataset_pipeline.png" alt="PaCo-RL Overview" width="800"/>
</div>

## Example Usage

PaCo-Reward-7B is fine-tuned based on [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), so you can load the model similarly with the following code:

```python
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "X-GenGroup/PaCo-Reward-7B", torch_dtype="bfloat16", device_map="auto"
)

# default processer
processor = AutoProcessor.from_pretrained("X-GenGroup/PaCo-Reward-7B")

image1 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_1.jpg'
image2 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_2.jpg'

main_prompt = 'Generate multiple images portraying a medical scene of a dentist in scrubs. The images should include activities such as explaining oral hygiene to a patient, taking X-rays of teeth, cleaning teeth in a dental office, and filling a cavity during an appointment. The setting should depict a realistic dental clinic.'
text_prompt = (
    f"Given two subfigures generated based on the theme: \"{main_prompt}\", "
    f"do the two images maintain consistency in terms of style, logic and identity? "
    f"Answer \"Yes\" and \"No\" first, and then provide detailed reasons."
)

# Example: Compare whether two images are visually consistent
messages_1 = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image1},
            {"type": "image", "image": image2},
            {"type": "text", "text": text_prompt},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages_1, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages_1)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Calculate consistency score
# Get logits for first token
with torch.no_grad():
    outputs = model(**inputs)
    first_token_logits = outputs.logits[0, -1, :]  # Last position of prompt

# Get token IDs for "Yes" and "No"
yes_id = processor.tokenizer.encode("Yes", add_special_tokens=False)[0]
no_id = processor.tokenizer.encode("No", add_special_tokens=False)[0]

# Calculate probability
yes_logit = first_token_logits[yes_id]
no_logit = first_token_logits[no_id]
yes_prob = torch.exp(yes_logit) / (torch.exp(yes_logit) + torch.exp(no_logit))

print(f"Consistency Score (Yes Conditional Probability): {yes_prob.item():.4f}")

# Inference: Generate detailed reasons
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
```

## 🎁 Model Zoo

This model is part of a larger collection of models within the PaCo-RL framework. More related models can be found in the [PaCo-RL Hugging Face collection](https://huggingface.co/collections/X-GenGroup/paco-rl).

| Model                   | Type                | HuggingFace                                                |
| :---------------------- | :------------------ | :--------------------------------------------------------- |
| **PaCo-Reward-7B**      | Reward Model        | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B)      |
| **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
| **PaCo-FLUX.1-dev**     | T2I Model (LoRA)    | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |

## ⭐ Citation

If you find our work helpful or inspiring, please feel free to cite it:

```bibtex
@misc{ping2025pacorladvancingreinforcementlearning,
      title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling}, 
      author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
      year={2025},
      eprint={2512.04784},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.04784}, 
}
```