Neta Cat Tower

Introduction

Neta Cat Tower is a text-to-image model fine-tuned from NetaYume Lumina.
This model was trained with the goal of enhancing anime style.
No learning was conducted regarding the addition of characters.

Model Description

Developed by: nuko masshigura
Model type: Text-to-Image generative model based on Neta Lumina
License: Apache License 2.0
Finetuned from model: NetaYume Lumina

Model Components

Diffusion Transformers: This model
Text Encoder: Pre-trained Gemma-2-2b
AutoEncoder: Pre-trained Flux.1 dev's AE

"all_in_one" is a single model that are combined with DiT, text encoder and autoencoder.

How to Get Started with the Model

Please refer to the Neta Lumina's model card.

Recommended settings

Sampler: res_multistep/ euler_ancestral
Scheduler: linear_quadratic
Steps: >=30
CFG (guidance): 4 – 5.5
Resolution: 1024 × 1024, 768 × 1532, 968 × 1322, or >= 1024

Prompt

Please refer to the Neta Lumina Prompt Book
About character knowledge, please refer to the NetaYume Lumina's Civitai page

Training Information

v1

base model: NetaYume Lumina v3.5 (pre-trained)
dataset: 2.1k anime style dataset with danbooru tags and English captions
hardware: Geforce RTX5090 x 1
training tool: sd-scripts
mixed_precision: bf16
save_precision: fp16
resolution: '1280,1280'
optimizer_type: AdamW8bit
learning_rate: 3e-5
lr_scheduler: warmup_stable_decay
train_epochs: 20
train_batch_size: 1
gradient_accumulation_steps: 4
min_snr_gamma: 5
ip_noise_gamma: 0.1
timestep_sampling: nextdit_shift

Acknowledgments

duongve: Thanks to duongve for sharing awesome model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for nukomasshigura/Neta-Cat-Tower

Base model

Alpha-VLLM/Lumina-Image-2.0

Finetuned

duongve/NetaYume-Lumina-Image-2.0

Finetuned

(1)

this model