arxiv:2411.16446

VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation

Published on Nov 25, 2024

Authors:

Abstract

VQ-SGen uses a two-stage vector-quantized stroke representation and Transformer-based autodecoder to generate high-quality sketches with high fidelity, supporting text and class label conditioning.

AI-generated summary

This paper presents VQ-SGen, a novel algorithm for high-quality creative sketch generation. Recent approaches have framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes. To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation. Our method follows a two-stage framework - in stage one, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In stage two, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process. By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as text or class label conditioned generation and sketch completion. Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset, underscoring its effectiveness.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.16446 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.16446 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.16446 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.