PaCo-Reward-7B-Lora / README.md
nielsr's picture
nielsr HF Staff
Improve model card for PaCo-RL
7589b76 verified
|
raw
history blame
4.03 kB
metadata
pipeline_tag: image-text-to-text
library_name: transformers

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

PaCo-RL is a comprehensive framework designed for consistent image generation using reinforcement learning. It tackles the challenges of preserving identities, styles, and logical coherence across multiple images, which is crucial for applications such as storytelling and character design.

This model is presented in the paper PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling.

Overview

PaCo-RL argues that reinforcement learning offers a promising alternative for learning complex and subjective visual criteria in a data-free manner. The framework combines a specialized consistency reward model with an efficient RL algorithm.

Key Components

  • PaCo-Reward: A pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and Chain-of-Thought (CoT) reasons.
  • PaCo-GRPO: An efficient RL algorithm leveraging a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization.

Extensive experiments show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability.

PaCo-RL Overview

Quick Start

For detailed installation, training of the reward model (PaCo-Reward), and running the full RL training (PaCo-GRPO), please refer to the official GitHub repository. The repository provides comprehensive documentation for each component.

Model Zoo

The PaCo-RL framework includes several models available on Hugging Face:

Model Type HuggingFace
PaCo-Reward-7B Reward Model 🤗 Link
PaCo-Reward-7B-Lora Reward Model (LoRA) 🤗 Link
PaCo-FLUX.1-dev T2I Model (LoRA) 🤗 Link
PaCo-FLUX.1-Kontext-dev Image Editing Model (LoRA) 🤗 Link
PaCo-QwenImage-Edit Image Editing Model (LoRA) 🤗 Link

Acknowledgement

Our work is built upon Flow-GRPO, LLaMA-Factory, vLLM, and Qwen2.5-VL. We sincerely thank the authors for their valuable contributions to the community.

Citation

@misc{ping2025pacorladvancingreinforcementlearning,
      title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling}, 
      author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
      year={2025},
      eprint={2512.04784},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.04784}, 
}
⭐ Star us on GitHub if you find PaCo-RL helpful!