Hunyuan Image 3.0 - INT8 Quantized

This is an INT8 quantized version of Tencent's HunyuanImage-3.0 model, optimized for high-end GPU workflows without CPU offloading.

Model Description

INT8 quantization of the Hunyuan Image 3.0 text-to-image diffusion transformer, providing a balance between the full BF16 precision and more aggressive NF4 quantization. This version maintains excellent image quality while reducing memory requirements.

Key Features:

  • ๐ŸŽฏ High quality output comparable to BF16
  • ๐Ÿ’พ ~80GB VRAM required (fits RTX 6000 Ada/Blackwell)
  • โšก ~3.5 minutes generation time at base resolution
  • ๐Ÿ”ง Designed for ComfyUI workflows

VRAM Requirements

Phase VRAM Usage
Weight Loading ~80 GB
Inference (additional) ~12-20 GB
Total ~92-100 GB

Recommended Hardware:

  • NVIDIA RTX 6000 Ada (48GB) - requires model split/offload
  • NVIDIA RTX 6000 Blackwell (96GB) - fits entirely in VRAM โœ… Workflows on the github page
  • Multi-GPU setups with 80GB+ combined VRAM

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3

Install the nodes and download this model to your ComfyUI models directory. The nodes handle INT8 loading automatically.

Direct Usage

# INT8 weights can be loaded with standard torch quantization
# See the ComfyUI nodes for reference implementation

Performance

  • Generation Time: ~3.5 minutes for base resolution (1024x1024)
  • Weight Loading: ~60 seconds (one-time per session)
  • Quality: Excellent - minimal degradation from BF16
  • Speed: Faster inference than BF16 due to reduced memory bandwidth

Quantization Details

  • Method: INT8 per-channel quantization
  • Target: Hunyuan Image 3.0 transformer backbone
  • Precision Loss: Minimal - image quality remains high
  • Trade-off: Middle ground between NF4 (lower quality) and BF16 (highest VRAM)

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0.

Original Model Details:

Please review the original model card and license for full details on capabilities and restrictions.

Limitations

  • Requires high-end professional GPU (80GB+ VRAM)
  • Not suitable for consumer GPUs (4090, 5090) without further optimization
  • INT8 quantization may introduce minor quality differences in edge cases
  • Loading time adds ~1 minute overhead to first generation

Credits

Original Model: Tencent Hunyuan Team Quantization: Eric Rollei ComfyUI Integration: Comfy_HunyuanImage3

License

This model inherits the license from the original Hunyuan Image 3.0 model:

Citation

@misc{hunyuan-image-3-int8,
  author = {Rollei, Eric},
  title = {Hunyuan Image 3.0 INT8 Quantized},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/[YOUR_USERNAME]/[MODEL_NAME]}}
}

Original model citation:

@misc{tencent2024hunyuan,
  title={Hunyuan Image 3.0},
  author={Tencent Hunyuan Team},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/tencent/HunyuanImage-3.0}}
}
Downloads last month
23
Safetensors
Model size
83B params
Tensor type
F32
ยท
BF16
ยท
I8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support