BAGEL-RecA
π Just 6 Γ 80GB A100s Γ 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
Paper
Reconstruction Alignment Improves Unified Multimodal Models
Project Page
https://reconstruction-alignment.github.io/
Code
https://github.com/HorizonWind2004/reconstruction-alignment
This repository hosts the model weights (NF4, INT8, BF16) for BAGEL-RecA. We fine-tuned BAGEL on 6 80GB NVIDIA A800 for only 27 GPU hours. While the understanding capability remains unchanged, our ReAlign method brings +3.6 zero-shot improvement on GenEval , +1.26 on DPGBench, +0.37 on ImgEdit and +0.33 on GEdit.
For installation, usage instructions, and further documentation, please visit our repo BAGEL's original GitHub repo.
DF11 version of BAGEL-RecA, many thanks to @theunlikely !!!
π§ Method
π Benchmarks
1. Text-to-Image Generation
We test it on 1024x1024 resolution.
| Model | GenEval β | DPGBench β | WISE β |
|---|---|---|---|
| BAGEL | 0.787 | 84.03 | 0.50 |
| BAGEL-RecA | 0.824 | 85.29 | 0.52 |
2. Image Editing
| Model | GEdit-Bench-EN (SC) β | GEdit-Bench-EN (PQ) β | GEdit-Bench-EN (O) β | ImgEdit β |
|---|---|---|---|---|
| BAGEL | 7.96 | 6.64 | 6.94 | 3.38 |
| BAGEL-NHR | 8.04 | 6.87 | 7.08 | 3.48 |
| BAGEL-RecA | 8.24 | 6.87 | 7.27 | 3.75 |
| FLUX Kontext | 6.95 | 7.30 | 6.27 | 3.59 |
License
BAGEL-RecA is licensed under the Apache 2.0 license.
βοΈ Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
- Downloads last month
- 60
