Z-Image Turbo Control Unified V2 (V2.1)

Github Original Repo

This repository hosts the Z-Image Turbo Control Unified V2 model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with enhanced ControlNet capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes in one place: Text-to-Image, Image-to-Image, ControlNet, and Inpainting.

Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM. This version also introduces significant optimizations, architectural improvements, and bug fixes for features like group_offload.

πŸ“₯ Installation

To set up the environment, simply install the dependencies:

#create virtual env
python -m venv venv

# Activate your venv

#upgrade pip
python.exe -m pip install --upgrade pip

#install requirements
pip install -r requirements.txt

Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.

πŸš€ Usage

πŸ“‚ Repository Structure

  • ./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf: The unified, quantized Q4_K_M model weights.
  • ./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf: The unified, quantized Q8_0 model weights.
  • infer_controlnet.py: Script for running controlnet inference.
  • infer_inpaint.py: Script for running inpaint inference.
  • infer_t2i.py: Script for running text-to-image inference.
  • infer_i2i.py: Script for running image-to-image inference.
  • diffusers_local/: Custom pipeline code (ZImageControlUnifiedPipeline) and transformer logic.
  • requirements.txt: Python dependencies.

The primary script for inference is infer_controlnet.py, which is designed to handle all supported generation modes.

Option 1: Low VRAM (GGUF) - Recommended

Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file (z_image_turbo_control_unified_v2.1_q4_k_m.gguf). Simply configure the infer_controlnet.py script to point to the GGUF file.

Key Features of this mode:

  • Loads the unified transformer from a single 4-bit quantized file.
  • Enables aggressive group_offload to fit large models in consumer GPUs.

Option 2: High Precision (Diffusers/BF16)

Use this version if you have ample VRAM (e.g., 24GB+). Configure infer_controlnet.py to load the model using the standard from_pretrained directory structure for full BFloat16 precision.

πŸ› οΈ Model Features & Configuration (V2)

Original Features

  • This ControlNet is added on 15 layer blocks and 2 refiner layer blocks.
  • The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images.
  • Multiple Control Conditions Supports Canny, HED, Depth, Pose, and MLSD, which can be used like a standard ControlNet.
  • You can adjust controlnet_conditioning_scale for stronger control. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.90.
    • Note on Steps: As you increase the control strength, it's recommended to appropriately increase the number of inference steps to achieve better results.

This optmized V2 model introduces several new features and parameters for enhanced control and flexibility:

  • Unified Pipeline: A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and Inpainting tasks.
  • Refiner Scale (controlnet_refiner_conditioning_scale): It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force.
  • Optional Refiner (add_control_noise_refiner=False): You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results.
  • Inpainting Blur (mask_blur_radius): A parameter to soften the edges of the inpainting mask for smoother transitions.
  • Backward Compatibility: The model supports running weights from V1.
  • Group Offload Fixes: The underlying code includes crucial fixes to ensure diffusers group_offload works correctly with use_stream=True, enabling efficient memory management without errors.

🏞️ V2 Examples: Refiner Scale Test

The new controlnet_refiner_conditioning_scale parameter allows for fine-tuning the control signal. Here is a comparison showing its effect while keeping the main control scale fixed.

Prompt: "Photorealistic portrait of a beautiful young East Asian woman with long, vibrant purple hair and a black bow. She is wearing a flowing white summer dress, standing on a sunny beach with a sparkling ocean and clear blue sky in the background. Bright natural sunlight, sharp focus, ultra-detailed." Control Image: Pose.

controlnet_conditioning_scale=0.75, num_steps=25 Refiner: Off Refiner Scale: 0.75 Refiner Scale: 1.0 Refiner Scale: 1.5 Refiner Scale: 2.0
Output

New Tests with this pipeline

Pose + Inpaint Output
Pose Output
Canny Output
HED Output
Depth Output
MLSD Output

T2I and I2I Results

T2I
I2I Output

Original Scale Test Results

The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength from the original model:

Diffusion Steps Scale 0.65 Scale 0.70 Scale 0.75 Scale 0.8 Scale 0.9 Scale 1.0
9
10
20
30
40
Downloads last month
109
GGUF
Model size
10B params
Architecture
lumina2
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support