xb-chang
xb-chang
·
AI & ML interests
None yet
Organizations
None yet
LLMs
Difffusion
-
Controlling Space and Time with Diffusion Models
Paper • 2407.07860 • Published • 17 -
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Paper • 2407.03300 • Published • 14 -
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper • 2407.01392 • Published • 45 -
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Paper • 2407.02687 • Published • 25
multimedia
-
Video-to-Audio Generation with Hidden Alignment
Paper • 2407.07464 • Published • 17 -
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 56 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 39
Neural Arch
Reinforcement Learning
Noisy datasets
vision language models (VLM)
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 84 -
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
Paper • 2407.07315 • Published • 7 -
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Paper • 2407.06189 • Published • 26
Data Generation
-
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Paper • 2407.05282 • Published • 16 -
Training Task Experts through Retrieval Based Distillation
Paper • 2407.05463 • Published • 10 -
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Paper • 2407.03471 • Published • 31
Video Analysis
Efficient LLMs
Reinforcement Learning
LLMs
Noisy datasets
Difffusion
-
Controlling Space and Time with Diffusion Models
Paper • 2407.07860 • Published • 17 -
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
Paper • 2407.03300 • Published • 14 -
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper • 2407.01392 • Published • 45 -
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Paper • 2407.02687 • Published • 25
vision language models (VLM)
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 84 -
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
Paper • 2407.07315 • Published • 7 -
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Paper • 2407.06189 • Published • 26
multimedia
-
Video-to-Audio Generation with Hidden Alignment
Paper • 2407.07464 • Published • 17 -
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 56 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 39
Data Generation
-
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Paper • 2407.05282 • Published • 16 -
Training Task Experts through Retrieval Based Distillation
Paper • 2407.05463 • Published • 10 -
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Paper • 2407.03471 • Published • 31
Neural Arch
Video Analysis