Neta Cat Tower
Introduction
Neta Cat Tower is a text-to-image model fine-tuned from NetaYume Lumina.
This model was trained with the goal of enhancing anime style.
No learning was conducted regarding the addition of characters.
Model Description
- Developed by: nuko masshigura
- Model type: Text-to-Image generative model based on Neta Lumina
- License: Apache License 2.0
- Finetuned from model: NetaYume Lumina
Model Components
- Diffusion Transformers: This model
- Text Encoder: Pre-trained Gemma-2-2b
- AutoEncoder: Pre-trained Flux.1 dev's AE
"all_in_one" is a single model that are combined with DiT, text encoder and autoencoder.
How to Get Started with the Model
Please refer to the Neta Lumina's model card.
Recommended settings
- Sampler: res_multistep/ euler_ancestral
- Scheduler: linear_quadratic
- Steps: >=30
- CFG (guidance): 4 – 5.5
- Resolution: 1024 × 1024, 768 × 1532, 968 × 1322, or >= 1024
Prompt
Please refer to the Neta Lumina Prompt Book
About character knowledge, please refer to the NetaYume Lumina's Civitai page
Training Information
v1
- base model: NetaYume Lumina v3.5 (pre-trained)
- dataset: 2.1k anime style dataset with danbooru tags and English captions
- hardware: Geforce RTX5090 x 1
- training tool: sd-scripts
- mixed_precision: bf16
- save_precision: fp16
- resolution: '1280,1280'
- optimizer_type: AdamW8bit
- learning_rate: 3e-5
- lr_scheduler: warmup_stable_decay
- train_epochs: 20
- train_batch_size: 1
- gradient_accumulation_steps: 4
- min_snr_gamma: 5
- ip_noise_gamma: 0.1
- timestep_sampling: nextdit_shift
Acknowledgments
- duongve: Thanks to duongve for sharing awesome model.
Model tree for nukomasshigura/Neta-Cat-Tower
Base model
Alpha-VLLM/Lumina-Image-2.0
Finetuned
duongve/NetaYume-Lumina-Image-2.0