Add model card for PaCo-RL (#1)
Browse files- Add model card for PaCo-RL (863f6234ecda69e664b8b025d4d81ea599eaff91)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
|
| 8 |
+
|
| 9 |
+
This repository contains the official implementation of **PaCo-RL**, a comprehensive framework for consistent image generation.
|
| 10 |
+
|
| 11 |
+
[\ud83d\udcda Paper](https://huggingface.co/papers/2512.04784) | [\ud83c\udf10 Project Page](https://x-gengroup.github.io/HomePage_PaCo-RL/) | [\ud83d\udcbb Code](https://github.com/X-GenGroup/PaCo-RL) | [\ud83e\udd17 Models & Data](https://huggingface.co/collections/X-GenGroup/paco-rl)
|
| 12 |
+
|
| 13 |
+
PaCo-RL aims to preserve identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. It leverages reinforcement learning to learn complex and subjective visual criteria without large-scale datasets, by combining a specialized consistency reward model (PaCo-Reward) with an efficient RL algorithm (PaCo-GRPO).
|
| 14 |
+
|
| 15 |
+
## Key Components
|
| 16 |
+
|
| 17 |
+
- **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons.
|
| 18 |
+
- **PaCo-GRPO**: An efficient RL optimization strategy that leverages a novel resolution-decoupled optimization to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
|
| 19 |
+
|
| 20 |
+
## \ud83d\ude80 Quick Start
|
| 21 |
+
|
| 22 |
+
For detailed instructions on installation, training the reward model, and running RL training, please refer to the [GitHub repository](https://github.com/X-GenGroup/PaCo-RL).
|
| 23 |
+
|
| 24 |
+
### Installation
|
| 25 |
+
```bash
|
| 26 |
+
git clone https://github.com/X-GenGroup/PaCo-RL.git
|
| 27 |
+
cd PaCo-RL
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### Train Reward Model
|
| 31 |
+
```bash
|
| 32 |
+
cd PaCo-Reward
|
| 33 |
+
conda create -n paco-reward python=3.12 -y
|
| 34 |
+
conda activate paco-reward
|
| 35 |
+
cd LLaMA-Factory && pip install -e ".[torch,metrics]" --no-build-isolation
|
| 36 |
+
cd .. && bash train/paco_reward.sh
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### Run RL Training
|
| 40 |
+
```bash
|
| 41 |
+
cd PaCo-GRPO
|
| 42 |
+
conda create -n paco-grpo python=3.12 -y
|
| 43 |
+
conda activate paco-grpo
|
| 44 |
+
pip install -e .
|
| 45 |
+
|
| 46 |
+
# Setup vLLM reward server
|
| 47 |
+
conda create -n vllm python=3.12 -y
|
| 48 |
+
conda activate vllm && pip install vllm
|
| 49 |
+
export CUDA_VISIBLE_DEVICES=0
|
| 50 |
+
export VLLM_MODEL_PATHS='X-GenGroup/PaCo-Reward-7B'
|
| 51 |
+
export VLLM_MODEL_NAMES='Paco-Reward-7B'
|
| 52 |
+
bash vllm_server/launch.sh
|
| 53 |
+
|
| 54 |
+
# Start training
|
| 55 |
+
export CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7
|
| 56 |
+
conda activate paco-grpo
|
| 57 |
+
bash scripts/single_node/train_flux.sh t2is
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## \ud83c\udf81 Model Zoo
|
| 61 |
+
|
| 62 |
+
| Model | Type | HuggingFace |
|
| 63 |
+
|-------|------|-------------|
|
| 64 |
+
| **PaCo-Reward-7B** | Reward Model | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
|
| 65 |
+
| **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
|
| 66 |
+
| **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
|
| 67 |
+
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
|
| 68 |
+
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
|
| 69 |
+
|
| 70 |
+
## Acknowledgement
|
| 71 |
+
Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
|
| 72 |
+
|
| 73 |
+
## Citation
|
| 74 |
+
If you find our work helpful or inspiring, please feel free to cite it:
|
| 75 |
+
```bibtex
|
| 76 |
+
@misc{ping2025pacorladvancingreinforcementlearning,
|
| 77 |
+
title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
|
| 78 |
+
author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
|
| 79 |
+
year={2025},
|
| 80 |
+
eprint={2512.04784},
|
| 81 |
+
archivePrefix={arXiv},
|
| 82 |
+
primaryClass={cs.CV},
|
| 83 |
+
url={https://arxiv.org/abs/2512.04784},
|
| 84 |
+
}
|
| 85 |
+
```
|