---
license: apache-2.0
base_model:
- duongve/NetaYume-Lumina-Image-2.0
pipeline_tag: text-to-image
---
# Neta Cat Tower

<img src="https://huggingface.co/nukomasshigura/Neta-Cat-Tower/resolve/main/asset/sample.png" width=75% />

## Introduction
**Neta Cat Tower** is a text-to-image model fine-tuned from [NetaYume Lumina](https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0).  
This model was trained with the goal of enhancing anime style.   
No learning was conducted regarding the addition of characters.  

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [nuko masshigura](https://huggingface.co/nukomasshigura)
- **Model type:** Text-to-Image generative model based on [Neta Lumina](https://huggingface.co/neta-art/Neta-Lumina)
- **License:** [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- **Finetuned from model:** [NetaYume Lumina](https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0)

### Model Components
 - Diffusion Transformers: This model  
 - Text Encoder: [Pre-trained Gemma-2-2b](https://huggingface.co/neta-art/Neta-Lumina/blob/main/Text%20Encoder/gemma_2_2b_fp16.safetensors)  
 - AutoEncoder: [Pre-trained Flux.1 dev's AE](https://huggingface.co/neta-art/Neta-Lumina/blob/main/VAE/ae.safetensors)  

"all_in_one" is a single model that are combined with DiT, text encoder and autoencoder.

## How to Get Started with the Model

Please refer to the [Neta Lumina's model card.](https://huggingface.co/neta-art/Neta-Lumina)

### Recommended settings 

- Sampler: res_multistep/ euler_ancestral
- Scheduler: linear_quadratic
- Steps: >=30
- CFG (guidance): 4 – 5.5
- Resolution: 1024 × 1024, 768 × 1532, 968 × 1322, or >= 1024

### Prompt

Please refer to the [Neta Lumina Prompt Book](https://www.neta.art/blog/neta_lumina_prompt_book/)  
About character knowledge, please refer to the [NetaYume Lumina's Civitai page](https://civitai.com/models/1790792)  

## Training Information
### v1
- base model: [NetaYume Lumina v3.5 (pre-trained)](https://civitai.com/models/1790792?modelVersionId=2298660)
- dataset: 2.1k anime style dataset with danbooru tags and English captions
- hardware: Geforce RTX5090 x 1
- training tool: sd-scripts
- mixed_precision: bf16
- save_precision: fp16
- resolution: '1280,1280'
- optimizer_type: AdamW8bit
- learning_rate: 3e-5
- lr_scheduler: warmup_stable_decay
- train_epochs: 20
- train_batch_size: 1
- gradient_accumulation_steps: 4
- min_snr_gamma: 5
- ip_noise_gamma: 0.1
- timestep_sampling: nextdit_shift

## Acknowledgments
- duongve: Thanks to [duongve](https://huggingface.co/duongve) for sharing awesome model.