Add model card for PaCo-Reward-7B with metadata, paper, project, and code links
Browse filesThis PR adds a comprehensive model card for the PaCo-Reward-7B model, a component of the PaCo-RL framework.
It includes:
- Relevant metadata: `pipeline_tag` (image-text-to-text) and `library_name` (transformers), derived from model configuration and functionality.
- Links to the research paper ([PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling](https://huggingface.co/papers/2512.04784)), the project page (https://x-gengroup.github.io/HomePage_PaCo-RL/), and the GitHub repository (https://github.com/X-GenGroup/PaCo-RL).
- A concise description of the model based on the paper abstract.
- The 'Overview' and 'Model Zoo' sections from the GitHub README to provide better context and discoverability of related models.
- The BibTeX citation for the paper.
The `license` and a code snippet for sample usage are omitted as no explicit evidence was found in the provided documentation, adhering to the safety guidelines.
Please review and merge if these improvements are satisfactory.
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# PaCo-Reward-7B: A Pairwise Consistency Evaluator from the PaCo-RL Framework
|
| 7 |
+
|
| 8 |
+
This repository contains **PaCo-Reward-7B**, a key component of the **PaCo-RL** framework, as presented in the paper:
|
| 9 |
+
[**PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling**](https://huggingface.co/papers/2512.04784)
|
| 10 |
+
|
| 11 |
+
The **PaCo-RL** framework is designed for consistent image generation through reinforcement learning, aiming to preserve identities, styles, and logical coherence across multiple images for applications like storytelling and character design. **PaCo-Reward-7B** specifically acts as a pairwise consistency evaluator. It is trained on a large-scale dataset constructed via automated sub-figure pairing and evaluates consistency through a generative, autoregressive scoring mechanism, enhanced by task-aware instructions and Chain-of-Thought (CoT) reasoning.
|
| 12 |
+
|
| 13 |
+
- **Project Page:** https://x-gengroup.github.io/HomePage_PaCo-RL/
|
| 14 |
+
- **Code Repository:** https://github.com/X-GenGroup/PaCo-RL
|
| 15 |
+
|
| 16 |
+
## π Overview
|
| 17 |
+
|
| 18 |
+
**PaCo-RL** is a comprehensive framework for consistent image generation through reinforcement learning, addressing challenges in preserving identities, styles, and logical coherence across multiple images for storytelling and character design applications.
|
| 19 |
+
|
| 20 |
+
### Key Components
|
| 21 |
+
|
| 22 |
+
- **PaCo-Reward**: A pairwise consistency evaluator with task-aware instruction and CoT reasoning.
|
| 23 |
+
- **PaCo-GRPO**: Efficient RL optimization with resolution-decoupled training and log-tamed multi-reward aggregation
|
| 24 |
+
|
| 25 |
+
## π Model Zoo
|
| 26 |
+
|
| 27 |
+
This model is part of a larger collection of models within the PaCo-RL framework. More related models can be found in the [PaCo-RL Hugging Face collection](https://huggingface.co/collections/X-GenGroup/paco-rl).
|
| 28 |
+
|
| 29 |
+
| Model | Type | HuggingFace |
|
| 30 |
+
| :---------------------- | :------------------ | :--------------------------------------------------------- |
|
| 31 |
+
| **PaCo-Reward-7B** | Reward Model | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B) |
|
| 32 |
+
| **PaCo-Reward-7B-Lora** | Reward Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora) |
|
| 33 |
+
| **PaCo-FLUX.1-dev** | T2I Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora) |
|
| 34 |
+
| **PaCo-FLUX.1-Kontext-dev** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-FLUX.1-Kontext-Lora) |
|
| 35 |
+
| **PaCo-QwenImage-Edit** | Image Editing Model (LoRA) | [π€ Link](https://huggingface.co/X-GenGroup/PaCo-Qwen-Image-Edit-Lora) |
|
| 36 |
+
|
| 37 |
+
## β Citation
|
| 38 |
+
|
| 39 |
+
If you find our work helpful or inspiring, please feel free to cite it:
|
| 40 |
+
|
| 41 |
+
```bibtex
|
| 42 |
+
@misc{ping2025pacorladvancingreinforcementlearning,
|
| 43 |
+
title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling},
|
| 44 |
+
author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
|
| 45 |
+
year={2025},
|
| 46 |
+
eprint={2512.04784},
|
| 47 |
+
archivePrefix={arXiv},
|
| 48 |
+
primaryClass={cs.CV},
|
| 49 |
+
url={https://arxiv.org/abs/2512.04784},
|
| 50 |
+
}
|
| 51 |
+
```
|