SD1.5 model, with SDXL vae grafted on, and then retrained to work properly

Currently only in huggingface/diffusers format. May generate a "checkpoint" model later

Creation notes:

dataset: 80k square images, from CC12M and LAION2b-aesthetic. autocaptioned with LLAVA.

phase 1: FP32 b32a8, optimi LION, LR 1e-5 const, for only 150 steps model locked except for following layers: in, out, up.3, down.0

Note that smaller trainable params lets us use b32 on a 4090 here

phase 2: FP32, b16a16, optimi LION, initial LR 1e-5, linear over 6 epochs (1920 effective steps) picked step 1800

phase 2 took around 15 hours, so total time maybe 16 hours

Why 2-phase

In theory, the phase 1 wasnt strictly neccessary. However, in early retraining, it would most likely hit very large changes to the core model, that arent strictly neccessary for vae retraining. So I picked minimal disruption

Downloads last month: 4

Model tree for opendiffusionai/xlsd32-beta1

Base model

stable-diffusion-v1-5/stable-diffusion-v1-5

Finetuned

opendiffusionai/XLSD-V0.0

Finetuned

(2)

this model

Finetunes

1 model