DiffSketcher
This is a Hugging Face implementation of DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models.
Model Description
DiffSketcher is a novel approach for synthesizing vector sketches from text prompts by leveraging the power of latent diffusion models. It extracts cross-attention maps from a pre-trained text-to-image diffusion model and uses them to guide the optimization of vector sketches.
Usage
from diffusers import DiffusionPipeline
# Load the pipeline
pipeline = DiffusionPipeline.from_pretrained("jree423/diffsketcher")
# Generate a vector sketch
result = pipeline(
prompt="A beautiful sunset over the mountains",
negative_prompt="ugly, blurry",
num_paths=96,
token_ind=4,
num_iter=800,
guidance_scale=7.5,
width=1.5,
seed=42
)
# Access the SVG string and rendered image
svg_string = result["svg"]
image = result["image"]
# Save the SVG
with open("sunset_sketch.svg", "w") as f:
f.write(svg_string)
# Save the image
image.save("sunset_sketch.png")
Parameters
prompt(str): The text prompt to guide the sketch generation.negative_prompt(str, optional): Negative text prompt for guidance.num_paths(int, optional): Number of paths to use in the sketch. Default is 96.token_ind(int, optional): Token index for attention. Default is 4.num_iter(int, optional): Number of optimization iterations. Default is 800.guidance_scale(float, optional): Scale for classifier-free guidance. Default is 7.5.width(float, optional): Stroke width. Default is 1.5.seed(int, optional): Random seed for reproducibility.return_dict(bool, optional): Whether to return a dict or tuple. Default is True.output_type(str, optional): Output type, one of "pil", "np", or "svg". Default is "pil".
Citation
@article{xing2023diffsketcher,
title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
author={Xing, Ximing and Xie, Chuang and Qiao, Yu and Xu, Hongteng},
journal={arXiv preprint arXiv:2306.14685},
year={2023}
}