---
license: apache-2.0
tags:
- text-to-image
- image-to-image
- inpainting
- controlnet
- diffusers
- gguf
- z-image-turbo
pipeline_tag: text-to-image
---

# Z-Image Turbo Control Unified V2 (V2.1)

[![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
[![Original Repo](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Original--Repo-yellow)](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1)

This repository hosts the **Z-Image Turbo Control Unified V2** model. This is a specialized architecture that unifies the powerful **Z-Image Turbo** base transformer with enhanced **ControlNet** capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes in one place: **Text-to-Image, Image-to-Image, ControlNet, and Inpainting**.

Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables **Unified GGUF Quantization**, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM. This version also introduces significant optimizations, architectural improvements, and bug fixes for features like `group_offload`.

## 📥 Installation

To set up the environment, simply install the dependencies:

```bash
#create virtual env
python -m venv venv

# Activate your venv

#upgrade pip
python.exe -m pip install --upgrade pip

#install requirements
pip install -r requirements.txt
```

*Note: This repository contains a `diffusers_local` folder with the custom `ZImageControlUnifiedPipeline` and transformer logic required to run this specific architecture.*

## 🚀 Usage

## 📂 Repository Structure

*   `./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf`: The unified, quantized Q4_K_M model weights.
*   `./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf`: The unified, quantized Q8_0 model weights.
*   `infer_controlnet.py`: Script for running controlnet inference.
*   `infer_inpaint.py`: Script for running inpaint inference.
*   `infer_t2i.py`: Script for running text-to-image inference.
*   `infer_i2i.py`: Script for running image-to-image inference.
*   `diffusers_local/`: Custom pipeline code (`ZImageControlUnifiedPipeline`) and transformer logic.
*   `requirements.txt`: Python dependencies.

The primary script for inference is `infer_controlnet.py`, which is designed to handle all supported generation modes.

### Option 1: Low VRAM (GGUF) - Recommended
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized **GGUF** file (`z_image_turbo_control_unified_v2.1_q4_k_m.gguf`). Simply configure the `infer_controlnet.py` script to point to the GGUF file.

**Key Features of this mode:**
*   Loads the unified transformer from a single 4-bit quantized file.
*   Enables aggressive `group_offload` to fit large models in consumer GPUs.

### Option 2: High Precision (Diffusers/BF16)
Use this version if you have ample VRAM (e.g., 24GB+). Configure `infer_controlnet.py` to load the model using the standard `from_pretrained` directory structure for full **BFloat16** precision.


## 🛠️ Model Features & Configuration (V2)

### Original Features
- This ControlNet is added on 15 layer blocks and 2 refiner layer blocks.
- The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images.
- Multiple Control Conditions Supports Canny, HED, Depth, Pose, and MLSD, which can be used like a standard ControlNet.
- You can adjust `controlnet_conditioning_scale` for stronger control. For better stability, we highly recommend using a detailed prompt. The optimal range for `controlnet_conditioning_scale` is from 0.65 to 0.90. 
  - **Note on Steps: As you increase the control strength, it's recommended to appropriately increase the number of inference steps to achieve better results.**

This optmized V2 model introduces several new features and parameters for enhanced control and flexibility:

*   **Unified Pipeline:** A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and Inpainting tasks.
*   **Refiner Scale (`controlnet_refiner_conditioning_scale`):** It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force.
*   **Optional Refiner (`add_control_noise_refiner=False`):** You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results.
*   **Inpainting Blur (`mask_blur_radius`):** A parameter to soften the edges of the inpainting mask for smoother transitions.
*   **Backward Compatibility:** The model supports running weights from V1.
*   **Group Offload Fixes:** The underlying code includes crucial fixes to ensure diffusers `group_offload` works correctly with `use_stream=True`, enabling efficient memory management without errors.

## 🏞️ V2 Examples: Refiner Scale Test

The new `controlnet_refiner_conditioning_scale` parameter allows for fine-tuning the control signal. Here is a comparison showing its effect while keeping the main control scale fixed.

**Prompt:** "Photorealistic portrait of a beautiful young East Asian woman with long, vibrant purple hair and a black bow. She is wearing a flowing white summer dress, standing on a sunny beach with a sparkling ocean and clear blue sky in the background. Bright natural sunlight, sharp focus, ultra-detailed."
**Control Image:** Pose.
| `controlnet_conditioning_scale=0.75, num_steps=25` | Refiner: Off | Refiner Scale: 0.75 | Refiner Scale: 1.0 | Refiner Scale: 1.5 | Refiner Scale: 2.0 |
|:---:|:---:|:---:|:---:|:---:|:---:|
| **Output** | ![](results/refiner_scale_test/result_control_pose_0.75_off.png) | ![](results/refiner_scale_test/result_control_pose_0.75_0.75.png) | ![](results/refiner_scale_test/result_control_pose_0.75_1.0.png) | ![](results/refiner_scale_test/result_control_pose_0.75_1.5.png) | ![](results/refiner_scale_test/result_control_pose_0.75_2.0.png) |

---
### New Tests with this pipeline

<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose + Inpaint</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/inpaint.jpg" width="100%" /><img src="assets/mask_inpaint.jpg" width="100%" /></td>
    <td><img src="results/new_tests/result_inpaint.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Pose</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/pose.jpg" width="100%" /></td>
    <td><img src="results/new_tests/result_control_pose.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Canny</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/canny.jpg" width="100%" /></td>
    <td><img src="results/new_tests/result_control_canny.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>HED</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/man_hed.png" width="100%" /></td>
    <td><img src="results/new_tests/result_control_hed.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>Depth</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/depth_cat.png" width="100%" /></td>
    <td><img src="results/new_tests/result_control_depth.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>MLSD</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/room_mlsd.png" width="100%" /></td>
    <td><img src="results/new_tests/result_control_mlsd.png" width="100%" /></td>
  </tr>
</table>

### T2I and I2I Results
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>T2I</td>    
  </tr>
  <tr>    
    <td><img src="results/new_tests/result_t2i.png" width="100%" /></td>
  </tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
  <tr>
    <td>I2I</td>
    <td>Output</td>
  </tr>
  <tr>
    <td><img src="assets/bottle.jpg" width="100%" /></td>
    <td><img src="results/new_tests/result_i2i.png" width="100%" /></td>
  </tr>
</table>

## Original Scale Test Results

The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength from the original model:

| Diffusion Steps | Scale 0.65 | Scale 0.70 | Scale 0.75 | Scale 0.8 | Scale 0.9 | Scale 1.0 |
|:---------------:|:----------:|:----------:|:----------:|:---------:|:---------:|:---------:|
| **9** | ![](results/scale_test/9_scale_0.65.png) | ![](results/scale_test/9_scale_0.70.png) | ![](results/scale_test/9_scale_0.75.png) | ![](results/scale_test/9_scale_0.8.png) | ![](results/scale_test/9_scale_0.9.png) | ![](results/scale_test/9_scale_1.0.png) |
| **10** | ![](results/scale_test/10_scale_0.65.png) | ![](results/scale_test/10_scale_0.70.png) | ![](results/scale_test/10_scale_0.75.png) | ![](results/scale_test/10_scale_0.8.png) | ![](results/scale_test/10_scale_0.9.png) | ![](results/scale_test/10_scale_1.0.png) |
| **20** | ![](results/scale_test/20_scale_0.65.png) | ![](results/scale_test/20_scale_0.70.png) | ![](results/scale_test/20_scale_0.75.png) | ![](results/scale_test/20_scale_0.8.png) | ![](results/scale_test/20_scale_0.9.png) | ![](results/scale_test/20_scale_1.0.png) |
| **30** | ![](results/scale_test/30_scale_0.65.png) | ![](results/scale_test/30_scale_0.70.png) | ![](results/scale_test/30_scale_0.75.png) | ![](results/scale_test/30_scale_0.8.png) | ![](results/scale_test/30_scale_0.9.png) | ![](results/scale_test/30_scale_1.0.png) |
| **40** | ![](results/scale_test/40_scale_0.65.png) | ![](results/scale_test/40_scale_0.70.png) | ![](results/scale_test/40_scale_0.75.png) | ![](results/scale_test/40_scale_0.8.png) | ![](results/scale_test/40_scale_0.9.png) | ![](results/scale_test/40_scale_1.0.png) |