Scaryplasmon96 commited on
Commit
8a2afce
·
verified ·
1 Parent(s): 4d6ae63

Delete README.md 🥐

Browse files
Files changed (1) hide show
  1. README.md 🥐 +0 -742
README.md 🥐 DELETED
@@ -1,742 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - stabilityai/stable-diffusion-2-1
7
- pipeline_tag: image-to-image
8
- library_name: diffusers
9
- tags:
10
- - diffusers
11
- - stable-diffusion-21
12
- ---
13
-
14
- <h1>DoodlePix</h1>
15
- <h3>Diffusion based Drawing Assistant</h3>
16
-
17
- <p><em>Aka</em></p>
18
-
19
- <p><em>Draw like a 5-year-old but get Great results!</em></p>
20
-
21
- [![WebPage](https://img.shields.io/badge/Web-Page-green)](https://scaryplasmon.github.io/DoodlePix/)
22
-
23
- [![GitHubPage](https://img.shields.io/badge/Git-Hub-blue)](https://github.com/Scaryplasmon/DoodlePix)
24
-
25
-
26
- <p>
27
- <a href="assets/DoodlePixy.gif">
28
- <img src="assets/DoodlePixy.gif" alt="Doodle">
29
- </a>
30
- </p>
31
-
32
- <hr>
33
-
34
- <details>
35
- <summary><strong>Pipeline</strong></summary>
36
- <ul>
37
- <li><strong>Inference:</strong> fits in &lt; 4GB</li> <li><strong>Resolution:</strong> 512x512px</li>
38
- <li><strong>Speed:</strong> ~15 steps/second</li>
39
- </ul>
40
- </details>
41
-
42
- <hr>
43
-
44
- <details>
45
- <summary><strong>Training</strong></summary>
46
- <ul>
47
- <li><strong>Base Model:</strong> <a href="https://huggingface.co/stabilityai/stable-diffusion-2-1">StableDiffusion 2.1</a></li>
48
- <li><strong>Training Requirements:</strong> &lt; 14GB</li> <li><strong>Setup:</strong> NVIDIA RTX4070 </li>
49
- </ul>
50
- <p>
51
- <img src="assets/DoodlePix.png" alt="Training Loop" style="width:100%; height:auto; object-fit:contain;">
52
- </p>
53
- <p>
54
- The model is trained using the InstructPix2Pix pipeline modified with the addition of a Multilayer Perceptron (FidelityMLP). The training loop processes input_image, edited_target_image and text_prompt with embedded fidelity <code>f[0-9]</code>. Input images are encoded into the latent space (VAE encoding), The prompt is processed by a CLIP text encoder, and the extracted fidelity value <code>f[0-9]</code> generates a corresponding fidelity embedding (through the FidelityMLP).
55
- </p>
56
- <p>
57
- The core diffusion process trains a U-Net to predict the noise <code>epsilon</code> added to the VAE-encoded <em>edited target</em> latents. Then U-Net is conditioned on both the fidelity injected text embeddings (via cross-attention) and the VAE-encoded <em>input image</em> (doodles) latents.
58
- </p>
59
- <p>
60
- The optimization combines two loss terms:
61
- </p>
62
- <ol>
63
- <li>A reconstruction loss <code>||epsilon - epsilon_theta||²</code>, minimizing the MSE between the sampled noise <code>epsilon</code> and the U-Net's predicted noise <code>epsilon_theta</code>.</li>
64
- <li>A fidelity-aware L1 loss, calculated on decoded images <code>P_i</code>, which balances adherence to the original input <code>O_i</code> and the edited target <code>E_i</code> based on the normalized fidelity value <code>L1(P_i, O_i) + (1 - F) • L1(P_i, E_i)</code>.</li>
65
- </ol>
66
- <p>
67
- The total loss drives gradient updates via an AdamW optimizer, simultaneously training the U-Net and the FidelityMLP.
68
- </p>
69
- </details>
70
-
71
- <hr>
72
-
73
- <details>
74
- <summary><strong>Dataset</strong></summary>
75
- <ul>
76
- <li><strong>Data Size:</strong> ~4.5k images</li>
77
- <li><strong>Image Generation:</strong> Dalle-3, Flux-Redux-DEV, SDXL</li>
78
- <li><strong>Edge Extraction:</strong> Canny, Fake Scribble, Scribble Xdog, HED soft edge, Manual</li>
79
- <li><strong>Doodles</strong> were hand-drawn and compose about 20% of the edges</li>
80
- </ul>
81
- </details>
82
-
83
- <hr>
84
-
85
- <h2>Fidelity Embedding in Action</h2>
86
- <p><em>Fidelity values range from 0 to 9 while keeping prompt, seed, and steps constant.</em></p>
87
-
88
- <table style="width:100%; table-layout: fixed;">
89
- <tbody>
90
- <tr>
91
- <td colspan="5" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;">
92
- Prompt: f*, red heart, white background.
93
- </td>
94
- </tr>
95
- <tr>
96
- <td style="text-align:center;">
97
- <strong>Image</strong><br>
98
- <img src="assets/heart.png" alt="Heart Image" style="width:150px; height:150px; object-fit:contain;">
99
- </td>
100
- <td style="text-align:center;">
101
- <strong>Normal</strong><br>
102
- <img src="assets/Heart.gif" alt="Heart Normal" style="width:150px; height:150px; object-fit:contain;">
103
- </td>
104
- <td style="text-align:center;">
105
- <strong>3D</strong><br>
106
- <img src="assets/Heart3D.gif" alt="Heart 3D" style="width:150px; height:150px; object-fit:contain;">
107
- </td>
108
- <td style="text-align:center;">
109
- <strong>Outline</strong><br>
110
- <img src="assets/HeartOutline.gif" alt="Heart Outline" style="width:150px; height:150px; object-fit:contain;">
111
- </td>
112
- <td style="text-align:center;">
113
- <strong>Flat</strong><br>
114
- <img src="assets/HeartFlat.gif" alt="Heart Flat" style="width:150px; height:150px; object-fit:contain;">
115
- </td>
116
- </tr>
117
- </tbody>
118
- </table>
119
-
120
- <p>-The model also accepts canny edges as input, while keeping fidelity injection relevant</p>
121
-
122
- <table style="width:100%; table-layout: fixed;">
123
- <tbody>
124
- <tr>
125
- <td colspan="5" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;">
126
- Prompt: f*, woman, portrait, frame. black hair, pink, black background.
127
- </td>
128
- </tr>
129
- <tr>
130
- <td style="text-align:center;">
131
- <strong>Image</strong><br>
132
- <img src="assets/woman.png" alt="Woman Image" style="width:150px; height:150px; object-fit:contain;">
133
- </td>
134
- <td style="text-align:center;">
135
- <strong>Normal</strong><br>
136
- <img src="assets/WomanNormal.gif" alt="Woman Normal" style="width:150px; height:150px; object-fit:contain;">
137
- </td>
138
- <td style="text-align:center;">
139
- <strong>3D</strong><br>
140
- <img src="assets/Woman3D.gif" alt="Woman 3D" style="width:150px; height:150px; object-fit:contain;">
141
- </td>
142
- <td style="text-align:center;">
143
- <strong>Outline</strong><br>
144
- <img src="assets/WomanOutline.gif" alt="Woman Outline" style="width:150px; height:150px; object-fit:contain;">
145
- </td>
146
- <td style="text-align:center;">
147
- <strong>Flat</strong><br>
148
- <img src="assets/WomanFlat.gif" alt="Woman Flat" style="width:150px; height:150px; object-fit:contain;">
149
- </td>
150
- </tr>
151
- </tbody>
152
- </table>
153
-
154
- <p>More Examples</p>
155
-
156
- <table style="width:100%; table-layout: fixed;">
157
- <tbody>
158
- <tr>
159
- <td colspan="2" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;">
160
- Prompt: f*, potion, bottle, cork. blue, brown, black background.
161
- </td>
162
- <td colspan="2" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;">
163
- Prompt: f*, maul, hammer. gray, brown, white background.
164
- </td>
165
- <td colspan="2" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;">
166
- Prompt: f*, torch, flame. red, brown, black background.
167
- </td>
168
- </tr>
169
- <tr>
170
- <td style="text-align:center;">
171
- <img src="assets/potion.png" alt="Potion Image" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
172
- </td>
173
- <td style="text-align:center;">
174
- <img src="assets/PotionSingle.gif" alt="Potion Normal" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
175
- </td>
176
- <td style="text-align:center;">
177
- <img src="assets/maul.png" alt="Maul Image" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
178
- </td>
179
- <td style="text-align:center;">
180
- <img src="assets/maulNormal.gif" alt="Maul Normal" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
181
- </td>
182
- <td style="text-align:center;">
183
- <img src="assets/torch.png" alt="Torch Image" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
184
- </td>
185
- <td style="text-align:center;">
186
- <img src="assets/TorchSingle.gif" alt="Torch Normal" style="width:100%; max-width:150px; height:auto; object-fit:contain;">
187
- </td>
188
- </tr>
189
- </tbody>
190
- </table>
191
-
192
- <table style="width:100%; height: 140px; table-layout: fixed;">
193
- <tbody>
194
- <tr>
195
- <td colspan="6" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
196
- </td>
197
- </tr>
198
- <tr>
199
- <td style="text-align:center;">
200
- input<br>
201
- <img src="assets/ringIn.png" alt="Input" style="width:120px; height:120px; object-fit:contain;">
202
- </td>
203
- <td style="text-align:center;">
204
- F0<br>
205
- <img src="assets/ringF0.webp" alt="Googh" style="width:120px; height:120px; object-fit:contain;">
206
- </td>
207
- <td style="text-align:center;">
208
- F9<br>
209
- <img src="assets/ringF9.webp" alt="DontStarve" style="width:120px; height:120px; object-fit:contain;">
210
- </td>
211
- <td style="text-align:center;">
212
- input<br>
213
- <img src="assets/fireIn.png" alt="Input" style="width:120px; height:120px; object-fit:contain;">
214
- </td>
215
- <td style="text-align:center;">
216
- F0<br>
217
- <img src="assets/fireF0.webp" alt="Googh" style="width:120px; height:120px; object-fit:contain;">
218
- </td>
219
- <td style="text-align:center;">
220
- F9<br>
221
- <img src="assets/fireF9.webp" alt="DontStarve" style="width:120px; height:120px; object-fit:contain;">
222
- </td>
223
- </tr>
224
- </tbody>
225
- </table>
226
-
227
- <h1>LORAs</h1>
228
- <p>Lora training allows you to quickly bake a specific Style into the model.</p>
229
-
230
- <table style="width:100%; height: 124px; table-layout: fixed;">
231
- <tbody>
232
- <tr>
233
- <td colspan="3" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
234
- </td>
235
- </tr>
236
- <tr>
237
- <td style="text-align:center;">
238
- input<br>
239
- <img src="assets/Googh/sunflower_DR.png" alt="Input" style="width:200px; height:200px; object-fit:contain;">
240
- </td>
241
- <td style="text-align:center;">
242
- Googh<br>
243
- <img src="assets/Googh/sunflower_0.png" alt="Googh" style="width:200px; height:200px; object-fit:contain;">
244
- </td>
245
- <td style="text-align:center;">
246
- DontStarve<br>
247
- <img src="assets/DontStarve/SunFlowers_4.png" alt="DontStarve" style="width:200px; height:200px; object-fit:contain;">
248
- </td>
249
- </tr>
250
- <tr>
251
- <td style="text-align:center;">
252
- input<br>
253
- <img src="assets/Googh/gift_DR.png" alt="Input" style="width:200px; height:200px; object-fit:contain;">
254
- </td>
255
- <td style="text-align:center;">
256
- Googh<br>
257
- <img src="assets/Googh/gift_3.png" alt="Googh" style="width:200px; height:200px; object-fit:contain;">
258
- </td>
259
- <td style="text-align:center;">
260
- DontStarve<br>
261
- <img src="assets/DontStarve/gift_20.png" alt="DontStarve" style="width:200px; height:200px; object-fit:contain;">
262
- </td>
263
- </tr>
264
- </tbody>
265
- </table>
266
-
267
- <hr>
268
-
269
- <h2>Lora Examples</h2>
270
-
271
- <details>
272
- <summary><h2>Googh</h2></summary>
273
- <p>Loras retains Styles and Fidelity injection from DoodlePix</p>
274
-
275
- <table style="width:100%; height: 124px; table-layout: fixed;">
276
- <tbody>
277
- <tr>
278
- <td colspan="5" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
279
- </td>
280
- </tr>
281
- <tr>
282
- <td style="text-align:center;">
283
- input<br>
284
- <img src="assets/Googh/man_DR2.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
285
- </td>
286
- <td style="text-align:center;">
287
- Normal<br>
288
- <img src="assets/Googh/manNormal.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
289
- </td>
290
- <td style="text-align:center;">
291
- 3D<br>
292
- <img src="assets/Googh/man3D.png" alt="3D" style="width:150px; height:150px; object-fit:contain;">
293
- </td>
294
- <td style="text-align:center;">
295
- Outline<br>
296
- <img src="assets/Googh/manOutline.png" alt="Outline" style="width:150px; height:150px; object-fit:contain;">
297
- </td>
298
- <td style="text-align:center;">
299
- Flat<br>
300
- <img src="assets/Googh/manFlat.png" alt="Flat" style="width:150px; height:150px; object-fit:contain;">
301
- </td>
302
- </tr>
303
- <tr>
304
- <td style="text-align:center;">
305
- Low Fidelity<br>
306
- <img src="assets/Googh/man_3.png" alt="Low Fidelity" style="width:150px; height:150px; object-fit:contain;">
307
- </td>
308
- <td style="text-align:center;">
309
- High Fidelity<br>
310
- <img src="assets/Googh/manFidelity7.png" alt="High Fidelity" style="width:150px; height:150px; object-fit:contain;">
311
- </td>
312
- <td colspan="3"></td>
313
- </tr>
314
- </tbody>
315
- </table>
316
-
317
-
318
- <table style="width:100%; height: 124px; table-layout: fixed;">
319
- <tbody>
320
- <tr>
321
- <td colspan="5" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
322
- </td>
323
- </tr>
324
- <tr>
325
- <td style="text-align:center;">
326
- input<br>
327
- <img src="assets/Googh/gift_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
328
- </td>
329
- <td style="text-align:center;">
330
- Normal<br>
331
- <img src="assets/Googh/giftNormal.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
332
- </td>
333
- <td style="text-align:center;">
334
- 3D<br>
335
- <img src="assets/Googh/gift3D.png" alt="3D" style="width:150px; height:150px; object-fit:contain;">
336
- </td>
337
- <td style="text-align:center;">
338
- Outline<br>
339
- <img src="assets/Googh/giftOutline.png" alt="Outline" style="width:150px; height:150px; object-fit:contain;">
340
- </td>
341
- <td style="text-align:center;">
342
- Flat<br>
343
- <img src="assets/Googh/giftFlat.png" alt="Flat" style="width:150px; height:150px; object-fit:contain;">
344
- </td>
345
- </tr>
346
- </tbody>
347
- </table>
348
-
349
- <p>More Examples:</p>
350
-
351
- <table style="width:100%; height: 140px; table-layout: fixed;">
352
- <tbody>
353
- <tr>
354
- <td colspan="3" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
355
- </td>
356
- </tr>
357
- <tr>
358
- <td style="text-align:center;">
359
- <img src="assets/Googh/road_DR.png" alt="Input" style="width:200px; height:200px; object-fit:contain;">
360
- </td>
361
- <td style="text-align:center;">
362
- <img src="assets/Googh/road4.png" alt="Normal" style="width:200px; height:200px; object-fit:contain;">
363
- </td>
364
- <td style="text-align:center;">
365
- <img src="assets/Googh/road5.png" alt="3D" style="width:200px; height:200px; object-fit:contain;">
366
- </td>
367
- </tr>
368
- <tr>
369
- <td style="text-align:center;">
370
- <img src="assets/Googh/road7.png" alt="Outline" style="width:200px; height:200px; object-fit:contain;">
371
- </td>
372
- <td style="text-align:center;">
373
- <img src="assets/Googh/road6.png" alt="Flat" style="width:200px; height:200px; object-fit:contain;">
374
- </td>
375
- <td style="text-align:center;">
376
- <img src="assets/Googh/road3.png" alt="Flat" style="width:200px; height:200px; object-fit:contain;">
377
- </td>
378
- </tr>
379
- </tbody>
380
- </table>
381
-
382
- <table style="width:100%; height: 140px; table-layout: fixed;">
383
- <tbody>
384
- <tr>
385
- <td colspan="3" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
386
- </td>
387
- </tr>
388
- <tr>
389
- <td style="text-align:center;">
390
- <img src="assets/Googh/flower_DR.png" alt="Input" style="width:200px; height:200px; object-fit:contain;">
391
- </td>
392
- <td style="text-align:center;">
393
- <img src="assets/Googh/flower1.png" alt="Normal" style="width:200px; height:200px; object-fit:contain;">
394
- </td>
395
- <td style="text-align:center;">
396
- <img src="assets/Googh/flower2.png" alt="3D" style="width:200px; height:200px; object-fit:contain;">
397
- </td>
398
- </tr>
399
- <tr>
400
- <td style="text-align:center;">
401
- <img src="assets/Googh/flower3.png" alt="Outline" style="width:200px; height:200px; object-fit:contain;">
402
- </td>
403
- <td style="text-align:center;">
404
- <img src="assets/Googh/flower4.png" alt="Flat" style="width:200px; height:200px; object-fit:contain;">
405
- </td>
406
- <td style="text-align:center;">
407
- <img src="assets/Googh/flower5.png" alt="Flat" style="width:200px; height:200px; object-fit:contain;">
408
- </td>
409
- </tr>
410
- </tbody>
411
- </table>
412
- </details>
413
-
414
- <hr>
415
-
416
- <details>
417
- <summary><h2>DontStarve</h2></summary>
418
-
419
- <table style="width:100%; height: 124px; table-layout: fixed;">
420
- <tbody>
421
- <tr>
422
- <td colspan="5" style="text-align:center; font-weight:italic; font-size:0.9rem; padding-bottom:0px;">
423
- </td>
424
- </tr>
425
- <tr>
426
- <td style="text-align:center;">
427
- Flower<br>
428
- <img src="assets/DontStarve/flower_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
429
- </td>
430
- <td style="text-align:center;">
431
- <br>
432
- <img src="assets/DontStarve/flower (1).png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
433
- </td>
434
- <td style="text-align:center;">
435
- <br>
436
- <img src="assets/DontStarve/flower (2).png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
437
- </td>
438
- <td style="text-align:center;">
439
- <br>
440
- <img src="assets/DontStarve/flower (3).png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
441
- </td>
442
- <td style="text-align:center;">
443
- <br>
444
- <img src="assets/DontStarve/flower (4).png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
445
- </td>
446
- </tr>
447
- <tr>
448
- <td style="text-align:center;">
449
- Gift<br>
450
- <img src="assets/DontStarve/gift_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
451
- </td>
452
- <td style="text-align:center;">
453
- <br>
454
- <img src="assets/DontStarve/gift_14.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
455
- </td>
456
- <td style="text-align:center;">
457
- <br>
458
- <img src="assets/DontStarve/gift_15.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
459
- </td>
460
- <td style="text-align:center;">
461
- <br>
462
- <img src="assets/DontStarve/gift_16.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
463
- </td>
464
- <td style="text-align:center;">
465
- <br>
466
- <img src="assets/DontStarve/gift_17.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
467
- </td>
468
- </tr> <tr>
469
- <td style="text-align:center;">
470
- Carrot<br>
471
- <img src="assets/DontStarve/carrot_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
472
- </td>
473
- <td style="text-align:center;">
474
- <br>
475
- <img src="assets/DontStarve/carrot_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
476
- </td>
477
- <td style="text-align:center;">
478
- <br>
479
- <img src="assets/DontStarve/carrot_1.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
480
- </td>
481
- <td style="text-align:center;">
482
- <br>
483
- <img src="assets/DontStarve/carrot_4.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
484
- </td>
485
- <td style="text-align:center;">
486
- <br>
487
- <img src="assets/DontStarve/carrot_6.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
488
- </td>
489
- </tr>
490
- <tr>
491
- <td style="text-align:center;">
492
- Rope<br>
493
- <img src="assets/DontStarve/rope_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
494
- </td>
495
- <td style="text-align:center;">
496
- <br>
497
- <img src="assets/DontStarve/rope_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
498
- </td>
499
- <td style="text-align:center;">
500
- <br>
501
- <img src="assets/DontStarve/rope_3.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
502
- </td>
503
- <td style="text-align:center;">
504
- <br>
505
- <img src="assets/DontStarve/rope_4.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
506
- </td>
507
- <td style="text-align:center;">
508
- <br>
509
- <img src="assets/DontStarve/rope_5.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
510
- </td>
511
- </tr>
512
- <tr>
513
- <td style="text-align:center;">
514
- Potato<br>
515
- <img src="assets/DontStarve/potato_DR.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
516
- </td>
517
- <td style="text-align:center;">
518
- <br>
519
- <img src="assets/DontStarve/potato_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
520
- </td>
521
- <td style="text-align:center;">
522
- <br>
523
- <img src="assets/DontStarve/potato_1.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
524
- </td>
525
- <td style="text-align:center;">
526
- <br>
527
- <img src="assets/DontStarve/potato_5.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
528
- </td>
529
- <td style="text-align:center;">
530
- <br>
531
- <img src="assets/DontStarve/potato_6.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
532
- </td>
533
- </tr>
534
- <tr>
535
- <td style="text-align:center;">
536
- Heart<br>
537
- <img src="assets/heart.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
538
- </td>
539
- <td style="text-align:center;">
540
- <br>
541
- <img src="assets/DontStarve/heart_1.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
542
- </td>
543
- <td style="text-align:center;">
544
- <br>
545
- <img src="assets/DontStarve/heart_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
546
- </td>
547
- <td style="text-align:center;">
548
- <br>
549
- <img src="assets/DontStarve/heart_2.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
550
- </td>
551
- <td style="text-align:center;">
552
- <br>
553
- <img src="assets/DontStarve/heart_4.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
554
- </td>
555
- </tr>
556
- <tr>
557
- <td style="text-align:center;">
558
- Axe<br>
559
- <img src="assets/axe.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
560
- </td>
561
- <td style="text-align:center;">
562
- <br>
563
- <img src="assets/DontStarve/axe_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
564
- </td>
565
- <td style="text-align:center;">
566
- <br>
567
- <img src="assets/DontStarve/axe_2.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
568
- </td>
569
- <td style="text-align:center;">
570
- <br>
571
- <img src="assets/DontStarve/axe_3.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
572
- </td>
573
- <td style="text-align:center;">
574
- <br>
575
- <img src="assets/DontStarve/axe_5.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
576
- </td>
577
- </tr>
578
- <tr>
579
- <td style="text-align:center;">
580
- Potion<br>
581
- <img src="assets/potion.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
582
- </td>
583
- <td style="text-align:center;">
584
- <br>
585
- <img src="assets/DontStarve/potion_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
586
- </td>
587
- <td style="text-align:center;">
588
- <br>
589
- <img src="assets/DontStarve/potion_5.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
590
- </td>
591
- <td style="text-align:center;">
592
- <br>
593
- <img src="assets/DontStarve/potion_8.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
594
- </td>
595
- <td style="text-align:center;">
596
- <br>
597
- <img src="assets/DontStarve/potion_10.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
598
- </td>
599
- </tr>
600
- <tr>
601
- <td style="text-align:center;">
602
- Torch<br>
603
- <img src="assets/torch.png" alt="Input" style="width:150px; height:150px; object-fit:contain;">
604
- </td>
605
- <td style="text-align:center;">
606
- <br>
607
- <img src="assets/DontStarve/torch_0.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
608
- </td>
609
- <td style="text-align:center;">
610
- <br>
611
- <img src="assets/DontStarve/torch_1.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
612
- </td>
613
- <td style="text-align:center;">
614
- <br>
615
- <img src="assets/DontStarve/torch_2.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
616
- </td>
617
- <td style="text-align:center;">
618
- <br>
619
- <img src="assets/DontStarve/torch_3.png" alt="Normal" style="width:150px; height:150px; object-fit:contain;">
620
- </td>
621
- </tr>
622
- </tbody>
623
- </table>
624
- </details>
625
-
626
-
627
- <p>The model shows great color understanding.</p>
628
-
629
- <table style="width:100%; height: 164px; table-layout: fixed;">
630
- <tbody>
631
- <tr>
632
- <td colspan="4" style="text-align:center; font-weight:bold; font-size:0.9rem; padding-bottom:8px;"> Prompt: f9, flower, stylized. *color, green, white
633
- </td>
634
- </tr>
635
- <tr>
636
- <td style="text-align:center;">
637
- <strong>input</strong><br>
638
- <img src="assets/flowerInput.png" alt="Flower Input" style="width:150px; height:150px; object-fit:contain;">
639
- </td>
640
- <td style="text-align:center;">
641
- <strong>red</strong><br>
642
- <img src="assets/flower2.png" alt="Flower red" style="width:150px; height:150px; object-fit:contain;">
643
- </td>
644
- <td style="text-align:center;">
645
- <strong>blue</strong><br>
646
- <img src="assets/flower3.png" alt="Flower light blue" style="width:150px; height:150px; object-fit:contain;">
647
- </td>
648
- <td style="text-align:center;">
649
- <strong>purple</strong><br>
650
- <img src="assets/flower4.png" alt="Flower purple" style="width:150px; height:150px; object-fit:contain;">
651
- </td>
652
- </tr>
653
- <tr>
654
- <td style="text-align:center;">
655
- <strong>green</strong><br>
656
- <img src="assets/flower1.png" alt="Flower green" style="width:150px; height:150px; object-fit:contain;">
657
- </td>
658
- <td style="text-align:center;">
659
- <strong>cyan</strong><br>
660
- <img src="assets/flower6.png" alt="Flower cyan" style="width:150px; height:150px; object-fit:contain;">
661
- </td>
662
- <td style="text-align:center;">
663
- <strong>yellow</strong><br>
664
- <img src="assets/flower7.png" alt="Flower light green" style="width:150px; height:150px; object-fit:contain;">
665
- </td>
666
- <td style="text-align:center;">
667
- <strong>orange</strong><br>
668
- <img src="assets/flower8.png" alt="Flower orange" style="width:150px; height:150px; object-fit:contain;">
669
- </td>
670
- </tr>
671
- </tbody>
672
- </table>
673
-
674
- <details>
675
- <summary><strong>Limitations</strong></summary>
676
- <ul>
677
- <li>The <strong>Model</strong> was trained mainly on objects, items. Things rather than Characters.</li>
678
- <li>It inherits most of the limitations of the StableDiffusion 2.1 model.</li>
679
- </ul>
680
- </details>
681
-
682
- <details>
683
- <summary><strong>Reasoning</strong></summary>
684
- <p>
685
- The objective is to train a model able to take drawings as inputs.
686
- </p>
687
- <p>
688
- While most models and controlnets were trained using canny or similar line extractors as inputs (which focuses on the most prominent lines in an image),
689
- drawings are made with intention. A few squiggly lines placed in the right place can sometimes deliver a much better idea of what's being represented in the image:
690
- </p>
691
- <table style="width: 60%; table-layout: fixed; margin-left: auto; margin-right: auto;"> <tbody>
692
- <tr>
693
- <td style="text-align: center;">
694
- <strong>Drawing</strong><br>
695
- <img src="assets/alien/alienDrawing.png" alt="Drawing" style="width: 60%; max-width: 240px; height: auto; object-fit: contain;">
696
- </td>
697
- <td style="text-align: center;">
698
- <strong>Canny</strong><br>
699
- <img src="assets/alien/alienCanny.png" alt="Canny" style="width: 60%; max-width: 240px; height: auto; object-fit: contain;">
700
- </td>
701
- </tr>
702
- </tbody>
703
- </table>
704
- <p>
705
- Although the InstructPix2Pix pipeline supports an ImageGuidance factor to control adherence to the input image, it tends to follow the drawing too strictly at higher values while losing compositional nuances at lower values.
706
- </p>
707
- </details>
708
-
709
- <h1>TODOs</h1>
710
-
711
- <details>
712
- <summary><strong>DATA</strong></summary>
713
- <ul>
714
- <li>[ ] Increase amount of hand-drawn line inputs</li>
715
- <li>[X] Smaller-Bigger subject variations</li>
716
- <li>[ ] Background Variations</li>
717
- <li>[ ] Increase Flat style references</li>
718
- <li>[ ] Improve color matches in prompts</li>
719
- <li>[ ] Clean up</li>
720
- </ul>
721
- </details>
722
-
723
- <details>
724
- <summary><strong>Training</strong></summary>
725
- <ul>
726
- <li>[X] Release V1</li>
727
- <li>[ ] Release DoodleCharacters (DoodlePix but for characters)</li>
728
- <li>[X] Release Training code</li>
729
- <li>[X] Release Lora Training code</li>
730
- </ul>
731
- </details>
732
-
733
- <h2>Credits</h2>
734
- <ul>
735
- <li>This is a custom implementation of the <a href="https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/train_instruct_pix2pix.py">Training</a> and <a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py">Pipeline</a> scripts from the <a href="https://github.com/huggingface/diffusers">Diffusers repo</a></li>
736
- <li>Dataset was generated using Chat based DALLE-3, <a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0">SDXL</a>, <a href="https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev">FLUX-REDUX-DEV</a></li>
737
- <li>Edge extraction was made easy thanks to <a href="https://github.com/Fannovel16/comfyui_controlnet_aux">Fannovel16's ComfyUI Controlnet Aux</a></li>
738
- <li><a href="https://www.comfy.org/">ComfyUI</a> was a big part of the Data Development process</li>
739
- <li>Around 30% of the images were captioned using <a href="https://huggingface.co/vikhyatk/moondream2">Moondream2</a></li>
740
- <li>Dataset Handlers were built using <a href="https://doc.qt.io/qtforpython-6/index.html">PyQT</a></li>
741
- <li>Huge Thanks to the OpenSource community for hosting and sharing so much cool stuff</li>
742
- </ul>