--- license: apache-2.0 tags: - text-to-image - image-to-image - inpainting - controlnet - diffusers - gguf - z-image-turbo pipeline_tag: text-to-image --- # Z-Image Turbo Control Unified V2 (V2.1) [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun) [![Original Repo](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Original--Repo-yellow)](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1) This repository hosts the **Z-Image Turbo Control Unified V2** model. This is a specialized architecture that unifies the powerful **Z-Image Turbo** base transformer with enhanced **ControlNet** capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes in one place: **Text-to-Image, Image-to-Image, ControlNet, and Inpainting**. Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables **Unified GGUF Quantization**, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM. This version also introduces significant optimizations, architectural improvements, and bug fixes for features like `group_offload`. ## 📥 Installation To set up the environment, simply install the dependencies: ```bash #create virtual env python -m venv venv # Activate your venv #upgrade pip python.exe -m pip install --upgrade pip #install requirements pip install -r requirements.txt ``` *Note: This repository contains a `diffusers_local` folder with the custom `ZImageControlUnifiedPipeline` and transformer logic required to run this specific architecture.* ## 🚀 Usage ## 📂 Repository Structure * `./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf`: The unified, quantized Q4_K_M model weights. * `./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf`: The unified, quantized Q8_0 model weights. * `infer_controlnet.py`: Script for running controlnet inference. * `infer_inpaint.py`: Script for running inpaint inference. * `infer_t2i.py`: Script for running text-to-image inference. * `infer_i2i.py`: Script for running image-to-image inference. * `diffusers_local/`: Custom pipeline code (`ZImageControlUnifiedPipeline`) and transformer logic. * `requirements.txt`: Python dependencies. The primary script for inference is `infer_controlnet.py`, which is designed to handle all supported generation modes. ### Option 1: Low VRAM (GGUF) - Recommended Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized **GGUF** file (`z_image_turbo_control_unified_v2.1_q4_k_m.gguf`). Simply configure the `infer_controlnet.py` script to point to the GGUF file. **Key Features of this mode:** * Loads the unified transformer from a single 4-bit quantized file. * Enables aggressive `group_offload` to fit large models in consumer GPUs. ### Option 2: High Precision (Diffusers/BF16) Use this version if you have ample VRAM (e.g., 24GB+). Configure `infer_controlnet.py` to load the model using the standard `from_pretrained` directory structure for full **BFloat16** precision. ## 🛠️ Model Features & Configuration (V2) ### Original Features - This ControlNet is added on 15 layer blocks and 2 refiner layer blocks. - The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images. - Multiple Control Conditions Supports Canny, HED, Depth, Pose, and MLSD, which can be used like a standard ControlNet. - You can adjust `controlnet_conditioning_scale` for stronger control. For better stability, we highly recommend using a detailed prompt. The optimal range for `controlnet_conditioning_scale` is from 0.65 to 0.90. - **Note on Steps: As you increase the control strength, it's recommended to appropriately increase the number of inference steps to achieve better results.** This optmized V2 model introduces several new features and parameters for enhanced control and flexibility: * **Unified Pipeline:** A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and Inpainting tasks. * **Refiner Scale (`controlnet_refiner_conditioning_scale`):** It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force. * **Optional Refiner (`add_control_noise_refiner=False`):** You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results. * **Inpainting Blur (`mask_blur_radius`):** A parameter to soften the edges of the inpainting mask for smoother transitions. * **Backward Compatibility:** The model supports running weights from V1. * **Group Offload Fixes:** The underlying code includes crucial fixes to ensure diffusers `group_offload` works correctly with `use_stream=True`, enabling efficient memory management without errors. ## 🏞️ V2 Examples: Refiner Scale Test The new `controlnet_refiner_conditioning_scale` parameter allows for fine-tuning the control signal. Here is a comparison showing its effect while keeping the main control scale fixed. **Prompt:** "Photorealistic portrait of a beautiful young East Asian woman with long, vibrant purple hair and a black bow. She is wearing a flowing white summer dress, standing on a sunny beach with a sparkling ocean and clear blue sky in the background. Bright natural sunlight, sharp focus, ultra-detailed." **Control Image:** Pose. | `controlnet_conditioning_scale=0.75, num_steps=25` | Refiner: Off | Refiner Scale: 0.75 | Refiner Scale: 1.0 | Refiner Scale: 1.5 | Refiner Scale: 2.0 | |:---:|:---:|:---:|:---:|:---:|:---:| | **Output** | ![](results/refiner_scale_test/result_control_pose_0.75_off.png) | ![](results/refiner_scale_test/result_control_pose_0.75_0.75.png) | ![](results/refiner_scale_test/result_control_pose_0.75_1.0.png) | ![](results/refiner_scale_test/result_control_pose_0.75_1.5.png) | ![](results/refiner_scale_test/result_control_pose_0.75_2.0.png) | --- ### New Tests with this pipeline
Pose + Inpaint Output
Pose Output
Canny Output
HED Output
Depth Output
MLSD Output
### T2I and I2I Results
T2I
I2I Output
## Original Scale Test Results The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength from the original model: | Diffusion Steps | Scale 0.65 | Scale 0.70 | Scale 0.75 | Scale 0.8 | Scale 0.9 | Scale 1.0 | |:---------------:|:----------:|:----------:|:----------:|:---------:|:---------:|:---------:| | **9** | ![](results/scale_test/9_scale_0.65.png) | ![](results/scale_test/9_scale_0.70.png) | ![](results/scale_test/9_scale_0.75.png) | ![](results/scale_test/9_scale_0.8.png) | ![](results/scale_test/9_scale_0.9.png) | ![](results/scale_test/9_scale_1.0.png) | | **10** | ![](results/scale_test/10_scale_0.65.png) | ![](results/scale_test/10_scale_0.70.png) | ![](results/scale_test/10_scale_0.75.png) | ![](results/scale_test/10_scale_0.8.png) | ![](results/scale_test/10_scale_0.9.png) | ![](results/scale_test/10_scale_1.0.png) | | **20** | ![](results/scale_test/20_scale_0.65.png) | ![](results/scale_test/20_scale_0.70.png) | ![](results/scale_test/20_scale_0.75.png) | ![](results/scale_test/20_scale_0.8.png) | ![](results/scale_test/20_scale_0.9.png) | ![](results/scale_test/20_scale_1.0.png) | | **30** | ![](results/scale_test/30_scale_0.65.png) | ![](results/scale_test/30_scale_0.70.png) | ![](results/scale_test/30_scale_0.75.png) | ![](results/scale_test/30_scale_0.8.png) | ![](results/scale_test/30_scale_0.9.png) | ![](results/scale_test/30_scale_1.0.png) | | **40** | ![](results/scale_test/40_scale_0.65.png) | ![](results/scale_test/40_scale_0.70.png) | ![](results/scale_test/40_scale_0.75.png) | ![](results/scale_test/40_scale_0.8.png) | ![](results/scale_test/40_scale_0.9.png) | ![](results/scale_test/40_scale_1.0.png) |