BAGEL-RecA

🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

Paper

Reconstruction Alignment Improves Unified Multimodal Models

Project Page

https://reconstruction-alignment.github.io/

Code

https://github.com/HorizonWind2004/reconstruction-alignment

This repository hosts the model weights (NF4, INT8, BF16) for BAGEL-RecA. We fine-tuned BAGEL on 6 80GB NVIDIA A800 for only 27 GPU hours. While the understanding capability remains unchanged, our ReAlign method brings +3.6 zero-shot improvement on GenEval , +1.26 on DPGBench, +0.37 on ImgEdit and +0.33 on GEdit.

For installation, usage instructions, and further documentation, please visit our repo BAGEL's original GitHub repo.

DF11 version of BAGEL-RecA, many thanks to @theunlikely !!!

🧠 Method

📊 Benchmarks

1. Text-to-Image Generation

We test it on 1024x1024 resolution.

Model	GenEval ↑	DPGBench ↑	WISE ↑
BAGEL	0.787	84.03	0.50
BAGEL-RecA	0.824	85.29	0.52

2. Image Editing

Model	GEdit-Bench-EN (SC) ↑	GEdit-Bench-EN (PQ) ↑	GEdit-Bench-EN (O) ↑	ImgEdit ↑
BAGEL	7.96	6.64	6.94	3.38
BAGEL-NHR	8.04	6.87	7.08	3.48
BAGEL-RecA	8.24	6.87	7.27	3.75
FLUX Kontext	6.95	7.30	6.27	3.59

License

BAGEL-RecA is licensed under the Apache 2.0 license.

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@article{xie2025reconstruction,
  title={Reconstruction Alignment Improves Unified Multimodal Models},
  author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
  journal={arXiv preprint arXiv:2509.07295},
  year={2025}
}