arxiv:2504.10307

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Published on Apr 14, 2025

Authors:

Abstract

A plug-and-play Cross-modal Side Adapter Network with Mixture of Modality Expert Fusion mechanism is proposed for efficient adaptation of multiple multimodal foundation models in sequential recommendation tasks.

AI-generated summary

In this paper, we explore a less-studied yet practically important problem: how to efficiently and effectively adapt multiple (>2) multimodal foundation models (MFMs) for the sequential recommendation task. To this end, we propose a plug-and-play Cross-modal Side Adapter Network (CROSSAN), which leverages a fully decoupled side adapter-based paradigm to achieve efficient and scalable adaptation. Compared to the state-of-the-art efficient approaches, CROSSAN reduces training time by over 30%, GPU memory consumption by 20%, and trainable parameters by over 57%, while enabling effective cross-modal learning across diverse modalities. To further enhance multimodal fusion, we introduce the Mixture of Modality Expert Fusion (MOMEF) mechanism. Extensive experiments on public benchmarks demonstrate that CROSSAN consistently outperforms existing methods, achieving 6.7%--8.1% performance improvements when adapting four foundation models with raw modalities. Moreover, the overall performance continues to improve as more MFMs are incorporated. We will release our code and datasets to faciliate future research.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.10307 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.10307 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.10307 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.