|
|
---
|
|
|
pipeline_tag: text-to-image
|
|
|
license: apache-2.0
|
|
|
base_model:
|
|
|
- neta-art/Neta-Lumina
|
|
|
- Alpha-VLLM/Lumina-Image-2.0
|
|
|
tags:
|
|
|
- stable-diffusion
|
|
|
- text-to-image
|
|
|
- comfyui
|
|
|
- diffusion-single-file
|
|
|
---
|
|
|
|
|
|
# NetaYume Lumina Image v2.0
|
|
|

|
|
|
This model is based on [Lumina-Image-2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0), which is a DIT model with 2 billions parameter flow-based diffusion transformer. For more information, visit [here](https://github.com/Alpha-VLLM/Lumina-Image-2.0).
|
|
|
|
|
|
---
|
|
|
**I. Introduction**
|
|
|
|
|
|
NetaYume Lumina is a text-to-image model fine-tuned from [Neta Lumina](https://huggingface.co/neta-art/Neta-Lumina), a high-quality anime-style image generation model developed by [Neta.art Lab](https://huggingface.co/neta-art). It builds upon [Lumina-Image-2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0), an open-source base model released by the [Alpha-VLLM](https://huggingface.co/Alpha-VLLM) team at Shanghai AI Laboratory.
|
|
|
|
|
|
This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.
|
|
|
|
|
|
**Key Features:**
|
|
|
- **High-Quality Anime Generation**: Generates detailed anime-style images with sharp outlines, vibrant colors, and smooth shading.
|
|
|
- **Improved Character Understanding**: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations.
|
|
|
- **Enhanced Fine Details**: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity.
|
|
|
|
|
|
|
|
|
- The file NetaYume_Lumina_v2_all_in_one.safetensors is an all-in-one file that contains the necessary weights for the VAE, text encoder, and image backbone to be used with ComfyUI.
|
|
|
|
|
|
---
|
|
|
|
|
|
2. Model Components & Training Details
|
|
|
- **Text Encoder**: Pre-trained **Gemma-2-2b**
|
|
|
- **Variational Autoencoder**: Pre-trained **Flux.1 dev's VAE**
|
|
|
- **Image Backbone**: Fine-tune **NetaLumina's Image Backbone**
|
|
|
|
|
|
---
|
|
|
|
|
|
3. Suggestion
|
|
|
|
|
|
**System Prompt:** This help you generate your desired images more easily by understanding and aligning with your prompts.
|
|
|
|
|
|
For anime-style images using Danbooru tags:
|
|
|
|
|
|
You are an assistant designed to generate anime images based on textual prompts.
|
|
|
|
|
|
You are an assistant designed to generate high-quality images based on user prompts and danbooru tags.
|
|
|
|
|
|
**Recommended Settings**
|
|
|
- CFG: 4–8
|
|
|
- Sampling Steps: 40-50
|
|
|
- Sampler:
|
|
|
- Euler a (with scheduler: normal)
|
|
|
- res_multistep (with scheduler: linear_quadratic)
|
|
|
|
|
|
---
|
|
|
4. Acknowledgments
|
|
|
- [narugo1992](https://huggingface.co/narugo) – for the invaluable Danbooru dataset
|
|
|
- [Alpha-VLLM](https://huggingface.co/Alpha-VLLM) - for creating the a wonderful model!
|
|
|
- [Neta Lumina](https://huggingface.co/neta-art/Neta-Lumina) and his team – for openly sharing a wonderful model. |