FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Paper • 2405.04883 • Published May 8, 2024
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces Paper • 2407.11895 • Published Jul 16, 2024 • 7
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization Paper • 2410.12957 • Published Oct 16, 2024 • 9
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup Paper • 2410.21269 • Published Oct 28, 2024
APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Paper • 2506.21655 • Published Jun 26