Jayce-Ping commited on
Commit
932497c
Β·
verified Β·
1 Parent(s): 56a872f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -21,11 +21,11 @@ This model is presented in the paper [PaCo-RL: Advancing Reinforcement Learning
21
  - **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
22
  - **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
23
 
24
- ## Overview
25
 
26
  PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
27
 
28
- ### Key Components
29
 
30
  - **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
31
  - **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
@@ -38,9 +38,8 @@ Extensive experiments show that PaCo-Reward significantly improves alignment wit
38
 
39
 
40
 
41
- ## Example Usage
42
  For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
43
-
44
  ```python
45
  import torch
46
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
@@ -127,7 +126,7 @@ print(output_text[0])
127
  ```
128
 
129
 
130
- ## Model Zoo
131
 
132
  The PaCo-RL framework includes several models available on Hugging Face:
133
 
@@ -139,12 +138,11 @@ The PaCo-RL framework includes several models available on Hugging Face:
139
  | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
140
  | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
141
 
142
- ## Acknowledgement
143
 
144
  Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
145
 
146
- ## Citation
147
-
148
  ```bibtex
149
  @misc{ping2025pacorladvancingreinforcementlearning,
150
  title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
 
21
  - **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
22
  - **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
23
 
24
+ ## 🌟 Overview
25
 
26
  PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
27
 
28
+ ### πŸ”‘ Key Components
29
 
30
  - **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
31
  - **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
 
38
 
39
 
40
 
41
+ ## πŸ’» Example Usage
42
  For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
 
43
  ```python
44
  import torch
45
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
 
126
  ```
127
 
128
 
129
+ ## 🎁 Model Zoo
130
 
131
  The PaCo-RL framework includes several models available on Hugging Face:
132
 
 
138
  | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
139
  | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [πŸ€— Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
140
 
141
+ ## πŸ€— Acknowledgement
142
 
143
  Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
144
 
145
+ ## ⭐ Citation
 
146
  ```bibtex
147
  @misc{ping2025pacorladvancingreinforcementlearning,
148
  title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},