CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model Paper • 2503.06472 • Published Mar 9 • 8
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning Paper • 2506.10963 • Published Jun 12 • 9
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25 • 37
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6, 2024 • 30