Jayce-Ping nielsr HF Staff commited on
Commit
1bf9d9d
·
verified ·
1 Parent(s): d2cc84e

Add model card for PaCo-RL (#1)

Browse files

- Add model card for PaCo-RL (863f6234ecda69e664b8b025d4d81ea599eaff91)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
8
+
9
+ This repository contains the official implementation of **PaCo-RL**, a comprehensive framework for consistent image generation.
10
+
11
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2512.04784) | [\ud83c\udf10 Project Page](https://x-gengroup.github.io/HomePage_PaCo-RL/) | [\ud83d\udcbb Code](https://github.com/X-GenGroup/PaCo-RL) | [\ud83e\udd17 Models & Data](https://huggingface.co/collections/X-GenGroup/paco-rl)
12
+
13
+ PaCo-RL aims to preserve identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. It leverages reinforcement learning to learn complex and subjective visual criteria without large-scale datasets, by combining a specialized consistency reward model (PaCo-Reward) with an efficient RL algorithm (PaCo-GRPO).
14
+
15
+ ## Key Components
16
+
17
+ - **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons.
18
+ - **PaCo-GRPO**: An efficient RL optimization strategy that leverages a novel resolution-decoupled optimization to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
19
+
20
+ ## \ud83d\ude80 Quick Start
21
+
22
+ For detailed instructions on installation, training the reward model, and running RL training, please refer to the [GitHub repository](https://github.com/X-GenGroup/PaCo-RL).
23
+
24
+ ### Installation
25
+ ```bash
26
+ git clone https://github.com/X-GenGroup/PaCo-RL.git
27
+ cd PaCo-RL
28
+ ```
29
+
30
+ ### Train Reward Model
31
+ ```bash
32
+ cd PaCo-Reward
33
+ conda create -n paco-reward python=3.12 -y
34
+ conda activate paco-reward
35
+ cd LLaMA-Factory && pip install -e ".[torch,metrics]" --no-build-isolation
36
+ cd .. && bash train/paco_reward.sh
37
+ ```
38
+
39
+ ### Run RL Training
40
+ ```bash
41
+ cd PaCo-GRPO
42
+ conda create -n paco-grpo python=3.12 -y
43
+ conda activate paco-grpo
44
+ pip install -e .
45
+
46
+ # Setup vLLM reward server
47
+ conda create -n vllm python=3.12 -y
48
+ conda activate vllm && pip install vllm
49
+ export CUDA_VISIBLE_DEVICES=0
50
+ export VLLM_MODEL_PATHS='X-GenGroup/PaCo-Reward-7B'
51
+ export VLLM_MODEL_NAMES='Paco-Reward-7B'
52
+ bash vllm_server/launch.sh
53
+
54
+ # Start training
55
+ export CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7
56
+ conda activate paco-grpo
57
+ bash scripts/single_node/train_flux.sh t2is
58
+ ```
59
+
60
+ ## \ud83c\udf81 Model Zoo
61
+
62
+ | Model | Type | HuggingFace |
63
+ |-------|------|-------------|
64
+ | **PaCo-Reward-7B** | Reward Model | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
65
+ | **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
66
+ | **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
67
+ | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
68
+ | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
69
+
70
+ ## Acknowledgement
71
+ Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
72
+
73
+ ## Citation
74
+ If you find our work helpful or inspiring, please feel free to cite it:
75
+ ```bibtex
76
+ @misc{ping2025pacorladvancingreinforcementlearning,
77
+ title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
78
+ author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
79
+ year={2025},
80
+ eprint={2512.04784},
81
+ archivePrefix={arXiv},
82
+ primaryClass={cs.CV},
83
+ url={https://arxiv.org/abs/2512.04784},
84
+ }
85
+ ```