MAGIC: A Co-Evolving Attackerโ€“Defender Adversarial Game for Robust LLM Safety

This repository provides the paper and official model links for MAGIC, a multi-backbone instruction-tuned model family designed to improve the robustness and safety of large language models.

๐Ÿง  Overview

MAGIC formulates LLM safety alignment as a co-evolving adversarial game between an attacker and a defender.

Instead of relying on static red-teaming datasets, the attacker continuously generates increasingly challenging harmful or policy-violating prompts, while the defender model is iteratively trained to resist these attacks without sacrificing helpfulness. Through this dynamic co-evolution process, the defender generalizes better to unseen and adaptive jailbreak attacks, leading to improved robustness in real-world deployment scenarios.

๐Ÿ“„ Paper

๐Ÿ”— Code & Repository

๐Ÿ“Š Datasets

The MAGIC Attack Pool Benchmark used in this work is publicly available on Hugging Face:

from datasets import load_dataset

dataset = load_dataset("XiaoyuWen/MAGIC-Attack-Pool-Benchmark")

๐Ÿค– Models

Official model checkpoints:

๐Ÿ“š Citation

@article{wen2026magic,
    title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
    author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
    journal={arXiv preprint arxiv:2602.01539},
    year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for XiaoyuWen/MAGIC