MAGIC: A Co-Evolving AttackerโDefender Adversarial Game for Robust LLM Safety
This repository provides the paper and official model links for MAGIC, a multi-backbone instruction-tuned model family designed to improve the robustness and safety of large language models.
๐ง Overview
MAGIC formulates LLM safety alignment as a co-evolving adversarial game between an attacker and a defender.
Instead of relying on static red-teaming datasets, the attacker continuously generates increasingly challenging harmful or policy-violating prompts, while the defender model is iteratively trained to resist these attacks without sacrificing helpfulness. Through this dynamic co-evolution process, the defender generalizes better to unseen and adaptive jailbreak attacks, leading to improved robustness in real-world deployment scenarios.
๐ Paper
- Title: MAGIC: A Co-Evolving AttackerโDefender Adversarial Game for Robust LLM Safety
- Authors: Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang
- arXiv: https://arxiv.org/abs/2602.01539
- PDF: https://arxiv.org/pdf/2602.01539
๐ Code & Repository
- Official GitHub Repository:
https://github.com/BattleWen/MAGIC
๐ Datasets
The MAGIC Attack Pool Benchmark used in this work is publicly available on Hugging Face:
from datasets import load_dataset
dataset = load_dataset("XiaoyuWen/MAGIC-Attack-Pool-Benchmark")
๐ค Models
Official model checkpoints:
Qwen2.5-7B-Instruct
https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-7B-InstructQwen2.5-14B-Instruct
https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-14B-InstructLLaMA3.1-8B-Instruct
https://huggingface.co/XiaoyuWen/MAGIC-Llama3.1-8B-Instruct
๐ Citation
@article{wen2026magic,
title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
journal={arXiv preprint arxiv:2602.01539},
year={2026}
}