Update README.md
Browse files
README.md
CHANGED
|
@@ -21,11 +21,11 @@ This model is presented in the paper [PaCo-RL: Advancing Reinforcement Learning
|
|
| 21 |
- **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
|
| 22 |
- **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
|
| 23 |
|
| 24 |
-
## Overview
|
| 25 |
|
| 26 |
PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
|
| 27 |
|
| 28 |
-
### Key Components
|
| 29 |
|
| 30 |
- **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
|
| 31 |
- **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
|
|
@@ -38,9 +38,8 @@ Extensive experiments show that PaCo-Reward significantly improves alignment wit
|
|
| 38 |
|
| 39 |
|
| 40 |
|
| 41 |
-
## Example Usage
|
| 42 |
For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
|
| 43 |
-
|
| 44 |
```python
|
| 45 |
import torch
|
| 46 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
|
|
@@ -127,7 +126,7 @@ print(output_text[0])
|
|
| 127 |
```
|
| 128 |
|
| 129 |
|
| 130 |
-
## Model Zoo
|
| 131 |
|
| 132 |
The PaCo-RL framework includes several models available on Hugging Face:
|
| 133 |
|
|
@@ -139,12 +138,11 @@ The PaCo-RL framework includes several models available on Hugging Face:
|
|
| 139 |
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
|
| 140 |
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
|
| 141 |
|
| 142 |
-
## Acknowledgement
|
| 143 |
|
| 144 |
Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
|
| 145 |
|
| 146 |
-
## Citation
|
| 147 |
-
|
| 148 |
```bibtex
|
| 149 |
@misc{ping2025pacorladvancingreinforcementlearning,
|
| 150 |
title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
|
|
|
|
| 21 |
- **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
|
| 22 |
- **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
|
| 23 |
|
| 24 |
+
## π Overview
|
| 25 |
|
| 26 |
PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
|
| 27 |
|
| 28 |
+
### π Key Components
|
| 29 |
|
| 30 |
- **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
|
| 31 |
- **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
|
|
|
|
| 38 |
|
| 39 |
|
| 40 |
|
| 41 |
+
## π» Example Usage
|
| 42 |
For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
|
|
|
|
| 43 |
```python
|
| 44 |
import torch
|
| 45 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
|
|
|
|
| 126 |
```
|
| 127 |
|
| 128 |
|
| 129 |
+
## π Model Zoo
|
| 130 |
|
| 131 |
The PaCo-RL framework includes several models available on Hugging Face:
|
| 132 |
|
|
|
|
| 138 |
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
|
| 139 |
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
|
| 140 |
|
| 141 |
+
## π€ Acknowledgement
|
| 142 |
|
| 143 |
Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
|
| 144 |
|
| 145 |
+
## β Citation
|
|
|
|
| 146 |
```bibtex
|
| 147 |
@misc{ping2025pacorladvancingreinforcementlearning,
|
| 148 |
title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
|