Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes Paper • 2504.09948 • Published Apr 14
High-Resolution Image Synthesis via Next-Token Prediction Paper • 2411.14808 • Published Nov 22, 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language Model Paper • 2411.17606 • Published Nov 26, 2024 • 1
Denoising with a Joint-Embedding Predictive Architecture Paper • 2410.03755 • Published Oct 2, 2024 • 1
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input Paper • 2408.15542 • Published Aug 28, 2024
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22, 2024 • 65
Involution: Inverting the Inherence of Convolution for Visual Recognition Paper • 2103.06255 • Published Mar 10, 2021
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks Paper • 1810.12348 • Published Oct 29, 2018
DisTime: Distribution-based Time Representation for Video Large Language Models Paper • 2505.24329 • Published May 30 • 1
MagicMirror: A Large-Scale Dataset and Benchmark for Fine-Grained Artifacts Assessment in Text-to-Image Generation Paper • 2509.10260 • Published Sep 12 • 3