Jayce-Ping commited on
Commit
2c89669
Β·
verified Β·
1 Parent(s): 1bf9d9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -52
README.md CHANGED
@@ -1,77 +1,69 @@
1
  ---
2
  license: apache-2.0
3
- pipeline_tag: image-to-image
4
  library_name: diffusers
5
  ---
6
 
7
  # PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
8
 
9
- This repository contains the official implementation of **PaCo-RL**, a comprehensive framework for consistent image generation.
 
 
 
 
 
10
 
11
- [\ud83d\udcda Paper](https://huggingface.co/papers/2512.04784) | [\ud83c\udf10 Project Page](https://x-gengroup.github.io/HomePage_PaCo-RL/) | [\ud83d\udcbb Code](https://github.com/X-GenGroup/PaCo-RL) | [\ud83e\udd17 Models & Data](https://huggingface.co/collections/X-GenGroup/paco-rl)
12
 
13
- PaCo-RL aims to preserve identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. It leverages reinforcement learning to learn complex and subjective visual criteria without large-scale datasets, by combining a specialized consistency reward model (PaCo-Reward) with an efficient RL algorithm (PaCo-GRPO).
14
 
15
- ## Key Components
16
 
17
- - **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons.
18
- - **PaCo-GRPO**: An efficient RL optimization strategy that leverages a novel resolution-decoupled optimization to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
19
 
20
- ## \ud83d\ude80 Quick Start
 
21
 
22
- For detailed instructions on installation, training the reward model, and running RL training, please refer to the [GitHub repository](https://github.com/X-GenGroup/PaCo-RL).
23
 
24
- ### Installation
25
- ```bash
26
- git clone https://github.com/X-GenGroup/PaCo-RL.git
27
- cd PaCo-RL
28
- ```
29
 
30
- ### Train Reward Model
31
- ```bash
32
- cd PaCo-Reward
33
- conda create -n paco-reward python=3.12 -y
34
- conda activate paco-reward
35
- cd LLaMA-Factory && pip install -e ".[torch,metrics]" --no-build-isolation
36
- cd .. && bash train/paco_reward.sh
37
- ```
 
 
 
38
 
39
- ### Run RL Training
40
- ```bash
41
- cd PaCo-GRPO
42
- conda create -n paco-grpo python=3.12 -y
43
- conda activate paco-grpo
44
- pip install -e .
45
-
46
- # Setup vLLM reward server
47
- conda create -n vllm python=3.12 -y
48
- conda activate vllm && pip install vllm
49
- export CUDA_VISIBLE_DEVICES=0
50
- export VLLM_MODEL_PATHS='X-GenGroup/PaCo-Reward-7B'
51
- export VLLM_MODEL_NAMES='Paco-Reward-7B'
52
- bash vllm_server/launch.sh
53
-
54
- # Start training
55
- export CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7
56
- conda activate paco-grpo
57
- bash scripts/single_node/train_flux.sh t2is
58
  ```
59
 
60
- ## \ud83c\udf81 Model Zoo
61
 
62
  | Model | Type | HuggingFace |
63
  |-------|------|-------------|
64
- | **PaCo-Reward-7B** | Reward Model | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
65
- | **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
66
- | **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
67
- | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
68
- | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [\ud83e\udd17 Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
69
 
70
- ## Acknowledgement
71
- Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
72
 
73
- ## Citation
74
- If you find our work helpful or inspiring, please feel free to cite it:
75
  ```bibtex
76
  @misc{ping2025pacorladvancingreinforcementlearning,
77
  title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
@@ -82,4 +74,8 @@ If you find our work helpful or inspiring, please feel free to cite it:
82
  primaryClass={cs.CV},
83
  url={https://arxiv.org/abs/2512.04784},
84
  }
85
- ```
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-to-image
4
  library_name: diffusers
5
  ---
6
 
7
  # PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
8
 
9
+ <div align="center">
10
+ <a href='https://arxiv.org/abs/2512.04784'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a> &nbsp;
11
+ <a href='https://x-gengroup.github.io/HomePage_PaCo-RL/'><img src='https://img.shields.io/badge/ProjectPage-purple?logo=github'></a> &nbsp;
12
+ <a href="https://github.com/X-GenGroup/PaCo-RL"><img src="https://img.shields.io/badge/Code-9E95B7?logo=github"></a> &nbsp;
13
+ <a href='https://huggingface.co/collections/X-GenGroup/paco-rl'><img src='https://img.shields.io/badge/Data & Model-green?logo=huggingface'></a> &nbsp;
14
+ </div>
15
 
16
+ The model presented in [PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling](https://huggingface.co/papers/2512.04784).
17
 
18
+ ## 🌟 Overview
19
 
20
+ **PaCo-RL** is a comprehensive framework for consistent image generation through reinforcement learning, addressing challenges in preserving identities, styles, and logical coherence across multiple images for storytelling and character design applications.
21
 
22
+ ### Key Components
 
23
 
24
+ - **PaCo-Reward**: A pairwise consistency evaluator with task-aware instruction and CoT reasoning.
25
+ - **PaCo-GRPO**: Efficient RL optimization with resolution-decoupled training and log-tamed multi-reward aggregation
26
 
 
27
 
28
+ ## Example Usage
 
 
 
 
29
 
30
+ ```python
31
+ import torch
32
+ from diffusers import FluxKontextPipeline
33
+ from peft import PeftModel
34
+ from diffusers.utils import load_image
35
+
36
+ pipe = FluxKontextPipeline.from_pretrained(
37
+ "black-forest-labs/FLUX.1-Kontext-dev",
38
+ torch_dtype=torch.bfloat16,
39
+ device_map="cuda"
40
+ )
41
 
42
+ pipe.transformer = PeftModel.from_pretrained(
43
+ pipe.transformer,
44
+ 'X-GenGroup/PaCo-FLUX.1-Kontext-Lora'
45
+ )
46
+ input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
47
+
48
+ image = pipe(
49
+ image=input_image,
50
+ prompt="Add a blue hat to the cat",
51
+ guidance_scale=2.5
52
+ ).images[0]
 
 
 
 
 
 
 
 
53
  ```
54
 
55
+ ## 🎁 Model Zoo
56
 
57
  | Model | Type | HuggingFace |
58
  |-------|------|-------------|
59
+ | **PaCo-Reward-7B** | Reward Model | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
60
+ | **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
61
+ | **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
62
+ | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
63
+ | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
64
 
 
 
65
 
66
+ ## ⭐ Citation
 
67
  ```bibtex
68
  @misc{ping2025pacorladvancingreinforcementlearning,
69
  title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
 
74
  primaryClass={cs.CV},
75
  url={https://arxiv.org/abs/2512.04784},
76
  }
77
+ ```
78
+
79
+ <div align="center">
80
+ <sub>⭐ Star us on GitHub if you find PaCo-RL helpful!</sub>
81
+ </div>