Papers
arxiv:2512.06589

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Published on Dec 6
· Submitted by jiaxiaojunQAQ on Dec 9
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

OmniSafeBench-MM is a comprehensive tool for evaluating multi-modal jailbreak attacks and defenses, covering various attack methods, defense strategies, and risk domains.

AI-generated summary

Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors. Existing benchmarks such as JailBreakV-28K, MM-SafetyBench, and HADES provide valuable insights into multi-modal vulnerabilities, but they typically focus on limited attack scenarios, lack standardized defense evaluation, and offer no unified, reproducible toolbox. To address these gaps, we introduce OmniSafeBench-MM, which is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation. OmniSafeBench-MM integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories, structured across consultative, imperative, and declarative inquiry types to reflect realistic user intentions. Beyond data coverage, it establishes a three-dimensional evaluation protocol measuring (1) harmfulness, distinguished by a granular, multi-level scale ranging from low-impact individual harm to catastrophic societal threats, (2) intent alignment between responses and queries, and (3) response detail level, enabling nuanced safety-utility analysis. We conduct extensive experiments on 10 open-source and 8 closed-source MLLMs to reveal their vulnerability to multi-modal jailbreak. By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research. The code is released at https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.

Community

Paper author Paper submitter

This work presents OmniSafeBench-MM, a unified, open-source benchmark and toolbox designed for comprehensive evaluation of multimodal jailbreak attack and defense methods. It integrates 13 representative attack techniques, 15 defense strategies, and a diverse dataset spanning 9 risk domains and 50 fine-grained categories, covering real-world–relevant query types (consultative, imperative, declarative). We also propose a three-dimensional evaluation protocol measuring harmfulness (from low-impact individual harm to societal-level threats), intent alignment, and response detail — allowing nuanced safety-utility tradeoff analysis. Our extensive experiments across 10 open-source and 8 closed-source multimodal LLMs reveal widespread vulnerabilities to multimodal jailbreak attacks. By unifying data, methods, and evaluation, OmniSafeBench-MM offers a standardized, reproducible platform — which we hope will become a foundational resource for future research on safe, robust multimodal LLMs.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.06589 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.06589 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.06589 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.