---
license: apache-2.0
tags:
- text-to-image
- image-to-image
- inpainting
- controlnet
- diffusers
- gguf
- z-image-turbo
pipeline_tag: text-to-image
---
# Z-Image Turbo Control Unified V2 (V2.1)
[](https://github.com/aigc-apps/VideoX-Fun)
[](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1)
This repository hosts the **Z-Image Turbo Control Unified V2** model. This is a specialized architecture that unifies the powerful **Z-Image Turbo** base transformer with enhanced **ControlNet** capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes in one place: **Text-to-Image, Image-to-Image, ControlNet, and Inpainting**.
Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables **Unified GGUF Quantization**, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM. This version also introduces significant optimizations, architectural improvements, and bug fixes for features like `group_offload`.
## 📥 Installation
To set up the environment, simply install the dependencies:
```bash
#create virtual env
python -m venv venv
# Activate your venv
#upgrade pip
python.exe -m pip install --upgrade pip
#install requirements
pip install -r requirements.txt
```
*Note: This repository contains a `diffusers_local` folder with the custom `ZImageControlUnifiedPipeline` and transformer logic required to run this specific architecture.*
## 🚀 Usage
## 📂 Repository Structure
* `./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf`: The unified, quantized Q4_K_M model weights.
* `./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf`: The unified, quantized Q8_0 model weights.
* `infer_controlnet.py`: Script for running controlnet inference.
* `infer_inpaint.py`: Script for running inpaint inference.
* `infer_t2i.py`: Script for running text-to-image inference.
* `infer_i2i.py`: Script for running image-to-image inference.
* `diffusers_local/`: Custom pipeline code (`ZImageControlUnifiedPipeline`) and transformer logic.
* `requirements.txt`: Python dependencies.
The primary script for inference is `infer_controlnet.py`, which is designed to handle all supported generation modes.
### Option 1: Low VRAM (GGUF) - Recommended
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized **GGUF** file (`z_image_turbo_control_unified_v2.1_q4_k_m.gguf`). Simply configure the `infer_controlnet.py` script to point to the GGUF file.
**Key Features of this mode:**
* Loads the unified transformer from a single 4-bit quantized file.
* Enables aggressive `group_offload` to fit large models in consumer GPUs.
### Option 2: High Precision (Diffusers/BF16)
Use this version if you have ample VRAM (e.g., 24GB+). Configure `infer_controlnet.py` to load the model using the standard `from_pretrained` directory structure for full **BFloat16** precision.
## 🛠️ Model Features & Configuration (V2)
### Original Features
- This ControlNet is added on 15 layer blocks and 2 refiner layer blocks.
- The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images.
- Multiple Control Conditions Supports Canny, HED, Depth, Pose, and MLSD, which can be used like a standard ControlNet.
- You can adjust `controlnet_conditioning_scale` for stronger control. For better stability, we highly recommend using a detailed prompt. The optimal range for `controlnet_conditioning_scale` is from 0.65 to 0.90.
- **Note on Steps: As you increase the control strength, it's recommended to appropriately increase the number of inference steps to achieve better results.**
This optmized V2 model introduces several new features and parameters for enhanced control and flexibility:
* **Unified Pipeline:** A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and Inpainting tasks.
* **Refiner Scale (`controlnet_refiner_conditioning_scale`):** It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force.
* **Optional Refiner (`add_control_noise_refiner=False`):** You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results.
* **Inpainting Blur (`mask_blur_radius`):** A parameter to soften the edges of the inpainting mask for smoother transitions.
* **Backward Compatibility:** The model supports running weights from V1.
* **Group Offload Fixes:** The underlying code includes crucial fixes to ensure diffusers `group_offload` works correctly with `use_stream=True`, enabling efficient memory management without errors.
## 🏞️ V2 Examples: Refiner Scale Test
The new `controlnet_refiner_conditioning_scale` parameter allows for fine-tuning the control signal. Here is a comparison showing its effect while keeping the main control scale fixed.
**Prompt:** "Photorealistic portrait of a beautiful young East Asian woman with long, vibrant purple hair and a black bow. She is wearing a flowing white summer dress, standing on a sunny beach with a sparkling ocean and clear blue sky in the background. Bright natural sunlight, sharp focus, ultra-detailed."
**Control Image:** Pose.
| `controlnet_conditioning_scale=0.75, num_steps=25` | Refiner: Off | Refiner Scale: 0.75 | Refiner Scale: 1.0 | Refiner Scale: 1.5 | Refiner Scale: 2.0 |
|:---:|:---:|:---:|:---:|:---:|:---:|
| **Output** |  |  |  |  |  |
---
### New Tests with this pipeline
| Pose |
Output |
 |
 |
| Canny |
Output |
 |
 |
| HED |
Output |
 |
 |
| Depth |
Output |
 |
 |
| MLSD |
Output |
 |
 |
### T2I and I2I Results
| T2I |
 |
| I2I |
Output |
 |
 |
## Original Scale Test Results
The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength from the original model:
| Diffusion Steps | Scale 0.65 | Scale 0.70 | Scale 0.75 | Scale 0.8 | Scale 0.9 | Scale 1.0 |
|:---------------:|:----------:|:----------:|:----------:|:---------:|:---------:|:---------:|
| **9** |  |  |  |  |  |  |
| **10** |  |  |  |  |  |  |
| **20** |  |  |  |  |  |  |
| **30** |  |  |  |  |  |  |
| **40** |  |  |  |  |  |  |