X-GenGroup
/

PaCo-Reward-7B-Lora

@@ -21,11 +21,11 @@ This model is presented in the paper [PaCo-RL: Advancing Reinforcement Learning
 - **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
 - **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
-## Overview
 PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
-### Key Components
 -   **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
 -   **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
@@ -38,9 +38,8 @@ Extensive experiments show that PaCo-Reward significantly improves alignment wit
-## Example Usage
 For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
 ```python
 import torch
 from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
@@ -127,7 +126,7 @@ print(output_text[0])
 ```
-## Model Zoo
 The PaCo-RL framework includes several models available on Hugging Face:
@@ -139,12 +138,11 @@ The PaCo-RL framework includes several models available on Hugging Face:
 | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [🤗 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
 | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [🤗 Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
-## Acknowledgement
 Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
-## Citation
 ```bibtex
 @misc{ping2025pacorladvancingreinforcementlearning,
       title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},

 - **Code Repository**: https://github.com/X-GenGroup/PaCo-RL
 - **Data & Models Collection**: https://huggingface.co/collections/X-GenGroup/paco-rl
+## 🌟 Overview
 PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.
+### 🔑 Key Components
 -   **PaCo-Reward**: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
 -   **PaCo-GRPO**: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.
+## 💻 Example Usage
 For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the [official GitHub repository](https://github.com/X-GenGroup/PaCo-RL). The repository provides comprehensive documentation for each component.
 ```python
 import torch
 from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
 ```
+## 🎁 Model Zoo
 The PaCo-RL framework includes several models available on Hugging Face:
 | **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [🤗 Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
 | **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [🤗 Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
+## 🤗 Acknowledgement
 Our work is built upon [Flow-GRPO](https://github.com/yifan123/flow_grpo), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [vLLM](https://github.com/vllm-project/vllm), and [Qwen2.5-VL](https://github.com/QwenLM/Qwen3-VL). We sincerely thank the authors for their valuable contributions to the community.
+## ⭐ Citation
 ```bibtex
 @misc{ping2025pacorladvancingreinforcementlearning,
       title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},