HunyuanImage-3.0-Nezha-Style-Adapter

Prompt
A young girl flying a kite on a windy hill in nezha style.
Prompt
A man riding a bicycle through heavy rain in nezha style.
Prompt
A greenish bronze guardian cooking dumplings in nezha style.
Prompt
A woman in flowing robes standing in a misty hall in nezha style.
Prompt
A grand temple above the clouds in nezha style.
Prompt
A soldier in armor holding a red tassel spear and shield standing on a battlefield in nezha style.

For more generation examples and their corresponding prompts, please refer to the images/more_examples folder and the captions.csv file.

Trigger words

You should use in nezha style. at the end of the prompt to trigger the image generation.

Training

We use this repo for training.

This repository provides three Nezha-style adapters, each trained to a different convergence stage, resulting in varying stylistic intensity:

  • Nezha-Lite
  • Nezha-Standard
  • Nezha-Strong

⚠️ Language note:
All adapters were trained using English-only captions.
Prompts should be written in English for best results.

License

This adapter collection is a Model Derivative of tencent/HunyuanImage-3.0 and is distributed under the Tencent Hunyuan Community License Agreement.

Use of these weights must comply with the Tencent Hunyuan Community License and its territorial and usage restrictions. See the LICENSE and NOTICE files in this repository for details.

This project is not affiliated with, associated with, sponsored by, or endorsed by Tencent, nor by the producers or rightsholders of any Nezha-related films, series, or games. This adapter does not grant you any rights in those third-party intellectual properties.

Nothing in this repository constitutes legal advice. You are solely responsible for ensuring that your use of the model and any generated outputs complies with all applicable laws and third-party rights.

Usage and Restrictions

  • This model/adapter collection is provided for non-commercial research and personal use only.
    Commercial use (including using the model or its outputs in a paid product, service, or large-scale deployment) is not permitted without obtaining appropriate permissions and verifying all relevant rights.

  • Users must comply with:

    • The Tencent Hunyuan Community License Agreement (including territorial and acceptable-use limitations).
    • Any applicable copyright, trademark, and other IP rights related to third-party content.
  • You are solely responsible for ensuring that your use of this adapter collection and any generated outputs complies with applicable laws in your jurisdiction.


Model Details

Model Description

  • Developed by: Pixo
  • Model type: LoRA/adapter for style transfer
  • License: Tencent Hunyuan Community License Agreement
  • Finetuned from model: tencent/HunyuanImage-3.0
  • Language(s): English prompts

Dataset

This adapter was trained on a curated, proprietary image dataset designed to capture a stylized Chinese 3D fantasy animation aesthetic with fiery, dramatic lighting and exaggerated mythic characters.

As with most generative models, outputs may sometimes resemble or evoke existing copyrighted works. You are responsible for ensuring that your use of any generated images complies with applicable copyright and trademark laws in your jurisdiction.

Uses

Direct Use

  • Apply a Nezha-inspired cinematic animation aesthetic to images generated by HunyuanImage-3.0
  • Use by loading the base model and applying one of the provided adapters
  • Trigger phrase: in nezha style. (must be at the end of the prompt)

Out-of-Scope Use

  • Harmful, deceptive, or NSFW content
  • Any use that violates the Tencent Hunyuan Community License Agreement
  • Any use that infringes third-party IP rights
  • Commercial use of the model or its outputs

Bias, Risks, and Limitations

  • Trained on a limited dataset; style may overfit in some scenarios
  • May exaggerate anatomy, motion, or proportions in complex scenes
  • English-only training limits effectiveness of non-English prompts
  • Style is intentionally dramatic and cinematic rather than realistic

How to Get Started with the Model

1️⃣ Download and clone the HunyuanImage-3.0 repo

git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
cd HunyuanImage-3.0/

hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3

2️⃣ Download the adapter collection

hf download pixosg/HunyuanImage-3.0-Nezha-Style-Adapter --local-dir ./hunyuanimage-3-nezha-style-adapter

3️⃣ Load the base model and adapter

from peft import PeftModel
from hunyuan_image_3.hunyuan import HunyuanImage3ForCausalMM
import torch

model_id = "./HunyuanImage-3"
adapter_model_path = "./hunyuanimage-3-nezha-style-adapter/nezha-standard"

kwargs = dict(
    attn_implementation="sdpa", # Use "flash_attention_2" if available
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
    moe_impl="eager",
    moe_drop_tokens=False,
)

model = HunyuanImage3ForCausalMM.from_pretrained(model_id, **kwargs)
model.load_tokenizer(model_id)

# Option 1
model.load_adapter(adapter_model_path)

# Option 2
model.get_input_embeddings = lambda: model.model.wte
model.set_input_embeddings = lambda value: setattr(model.model, 'wte', value)
model = PeftModel.from_pretrained(model, adapter_model_path, trust_remote_code=True)

# Generate image
prompt = "A young girl flying a kite on a windy hill in nezha style."
image = model.generate_image(prompt=prompt, stream=True)
image.save("image.png")

4️⃣ Optimized Inference (8-bit Quantization)

While standard BF16 inference typically requires 3x NVIDIA A100 (80GB) GPUs, this 8-bit quantized configuration enables high-quality generation on a single NVIDIA H200 (141GB).

import torch
from transformers import BitsAndBytesConfig
from peft import PeftModel
from hunyuan_image_3.hunyuan import HunyuanImage3ForCausalMM

# ------------------------------------------------------------
# Patch (apply BEFORE model loading)
# Keep attention mask on the same device as the input tensor.
# ------------------------------------------------------------
_orig_prepare = HunyuanImage3ForCausalMM._prepare_attention_mask_for_generation

def _prepare_attention_mask_for_generation_patched(self, inputs_tensor, generation_config, model_kwargs):
    attn_mask = _orig_prepare(self, inputs_tensor, generation_config, model_kwargs)
    if attn_mask is not None and attn_mask.device != inputs_tensor.device:
        attn_mask = attn_mask.to(device=inputs_tensor.device)
    return attn_mask

HunyuanImage3ForCausalMM._prepare_attention_mask_for_generation = _prepare_attention_mask_for_generation_patched
# ------------------------------------------------------------

skip_modules = [
    "vae",
    "vision_model",
    "vision_aligner",
    "patch_embed",
    "timestep_emb",
    "time_embed",
    "time_embed_2",
    "final_layer",
    "lm_head",
]

quant_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_skip_modules=skip_modules,
    llm_int8_enable_fp32_cpu_offload=True,
)

model_id = "./HunyuanImage-3"
adapter_model_path = "./hunyuanimage-3-nezha-style-adapter/nezha-standard"

kwargs = dict(
    attn_implementation="sdpa", # Use "flash_attention_2" if available
    trust_remote_code=True,
    quantization_config=quant_config,
    dtype="auto",
    device_map="auto",
    moe_impl="eager",
    moe_drop_tokens=False,
)

model = HunyuanImage3ForCausalMM.from_pretrained(model_id, **kwargs)
model.load_tokenizer(model_id)

# Apply LoRA adapter
model.get_input_embeddings = lambda: model.model.wte
model.set_input_embeddings = lambda value: setattr(model.model, 'wte', value)
model = PeftModel.from_pretrained(model, adapter_model_path, trust_remote_code=True)

# Generate image
prompt = "A young girl flying a kite on a windy hill in nezha style."
image = model.generate_image(prompt=prompt, stream=True)
image.save("image.png")
Downloads last month
11
Inference Examples
Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pixosg/HunyuanImage-3.0-Nezha-Style-Adapter

Adapter
(6)
this model