Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness Paper โข 2510.01670 โข Published Oct 2 โข 6
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Paper โข 2504.03735 โข Published Apr 1 โข 1
Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models Paper โข 2411.04291 โข Published Nov 6, 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need? Paper โข 2406.02575 โข Published May 27, 2024 โข 1
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks Paper โข 2310.10844 โข Published Oct 16, 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Paper โข 2307.14539 โข Published Jul 26, 2023 โข 2