File size: 7,024 Bytes
c3df7d4 fcd0aca c3df7d4 8c52d86 c3df7d4 cfaa9b9 fcd0aca c3df7d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
---
# PaCo-Reward-7B: A Pairwise Consistency Evaluator from the PaCo-RL Framework
<div align="center">
<a href='https://arxiv.org/abs/2512.04784'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a>
<a href='https://x-gengroup.github.io/HomePage_PaCo-RL/'><img src='https://img.shields.io/badge/ProjectPage-purple?logo=github'></a>
<a href="https://github.com/X-GenGroup/PaCo-RL"><img src="https://img.shields.io/badge/Code-9E95B7?logo=github"></a>
<a href='https://huggingface.co/collections/X-GenGroup/paco-rl'><img src='https://img.shields.io/badge/Data & Model-green?logo=huggingface'></a>
</div>
This repository contains **PaCo-Reward-7B**, a key component of the **PaCo-RL** framework, as presented in the paper:
[**PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling**](https://huggingface.co/papers/2512.04784)
The **PaCo-RL** framework is designed for consistent image generation through reinforcement learning, aiming to preserve identities, styles, and logical coherence across multiple images for applications like storytelling and character design. **PaCo-Reward-7B** specifically acts as a pairwise consistency evaluator. It is trained on a large-scale dataset constructed via automated sub-figure pairing and evaluates consistency through a generative, autoregressive scoring mechanism, enhanced by task-aware instructions and Chain-of-Thought (CoT) reasoning.
- **Project Page:** https://x-gengroup.github.io/HomePage_PaCo-RL/
- **Code Repository:** https://github.com/X-GenGroup/PaCo-RL
## π Overview
**PaCo-RL** is a comprehensive framework for consistent image generation through reinforcement learning, addressing challenges in preserving identities, styles, and logical coherence across multiple images for storytelling and character design applications.
### Key Components
- **PaCo-Reward**: A pairwise consistency evaluator with task-aware instruction and CoT reasoning.
- **PaCo-GRPO**: Efficient RL optimization with resolution-decoupled training and log-tamed multi-reward aggregation
<div align="center">
<img src="https://github.com/X-GenGroup/PaCo-RL/raw/main/assets/dataset_pipeline.png" alt="PaCo-RL Overview" width="800"/>
</div>
## Example Usage
PaCo-Reward-7B is fine-tuned based on [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), so you can load the model similarly with the following code:
```python
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"X-GenGroup/PaCo-Reward-7B", torch_dtype="bfloat16", device_map="auto"
)
# default processer
processor = AutoProcessor.from_pretrained("X-GenGroup/PaCo-Reward-7B")
image1 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_1.jpg'
image2 = 'https://huggingface.co/X-GenGroup/PaCo-Reward-7B/resolve/main/images/image_2.jpg'
main_prompt = 'Generate multiple images portraying a medical scene of a dentist in scrubs. The images should include activities such as explaining oral hygiene to a patient, taking X-rays of teeth, cleaning teeth in a dental office, and filling a cavity during an appointment. The setting should depict a realistic dental clinic.'
text_prompt = (
f"Given two subfigures generated based on the theme: \"{main_prompt}\", "
f"do the two images maintain consistency in terms of style, logic and identity? "
f"Answer \"Yes\" and \"No\" first, and then provide detailed reasons."
)
# Example: Compare whether two images are visually consistent
messages_1 = [
{
"role": "user",
"content": [
{"type": "image", "image": image1},
{"type": "image", "image": image2},
{"type": "text", "text": text_prompt},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages_1, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages_1)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Calculate consistency score
# Get logits for first token
with torch.no_grad():
outputs = model(**inputs)
first_token_logits = outputs.logits[0, -1, :] # Last position of prompt
# Get token IDs for "Yes" and "No"
yes_id = processor.tokenizer.encode("Yes", add_special_tokens=False)[0]
no_id = processor.tokenizer.encode("No", add_special_tokens=False)[0]
# Calculate probability
yes_logit = first_token_logits[yes_id]
no_logit = first_token_logits[no_id]
yes_prob = torch.exp(yes_logit) / (torch.exp(yes_logit) + torch.exp(no_logit))
print(f"Consistency Score (Yes Conditional Probability): {yes_prob.item():.4f}")
# Inference: Generate detailed reasons
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
```
## π Model Zoo
This model is part of a larger collection of models within the PaCo-RL framework. More related models can be found in the [PaCo-RL Hugging Face collection](https://huggingface.co/collections/X-GenGroup/paco-rl).
| Model | Type | HuggingFace |
| :---------------------- | :------------------ | :--------------------------------------------------------- |
| **PaCo-Reward-7B** | Reward Model | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
| **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
| **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
## β Citation
If you find our work helpful or inspiring, please feel free to cite it:
```bibtex
@misc{ping2025pacorladvancingreinforcementlearning,
title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
year={2025},
eprint={2512.04784},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.04784},
}
``` |