中心自 2018 年以来一直对可信人工智能,包括图像分类对抗攻击和对抗防御展开深入的研究,并得多项成果,早期于 2019、2020 年在 ICLR等发表了三篇论文,每个单篇谷歌引用量已达 80 多;同时于 2019 年,我们开始探索在自然语言处理领域文本分类任务上的对抗样本问题,在计算机语言学顶会 ACL 2019(Oral)发表了基于贪心策略的对抗攻击算法,是最早将对抗样本研究引入自然语言处理领域的团队之一。该篇论文的谷歌引用量已达 220 多,成为自然语言处理对抗样本研究的代表工作。随后提出的文本分类对抗防御算法发表在 UAI 2021 上,谷歌引用量已达 50 多,受到国内外学术界的广泛关注。2020 年展开了对机器翻译模型的对抗攻击研究,所做工作被计算语言学的顶级会议 ACL 2021 录用为 Oral。
代表性论文成果:
图像对抗攻击:
[1] Jiadong Lin#, Chuanbiao Song#, Kun He#*, Liwei Wang, John E. Hopcroft. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. ICLR 2020.
[2] Xiaosen Wang, Kun He*. Enhancing the Transferability of Adversarial Attacks Through Variance Tuning. CVPR 2021: 1924-1933.
[3] Yifeng Xiong#, Jiadong Lin#, Min Zhang, John E. Hopcroft, Kun He*. Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022: 14963-14972.
[4] Xiaosen Wang, Xuanran He, Jingdong Wang, Kun He*. Admix: Enhancing the Transferability of Adversarial Attacks. ICCV 2021: 16138-16147.
[5] Xiaosen Wang, Zeliang Zhang, Kangheng Tong, Dihong Gong, Kun He*, Zhifeng Li, Wei Liu. Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack. ECCV (5) 2022: 156-174.
图像对抗防御:
[1] Chuanbiao Song, Kun He*, Liwei Wang, John E. Hopcroft. Improving the Generalization of Adversarial Training with Domain Adaptation. ICLR 2019.
[2] Chuanbiao Song#, Kun He#*, Jiadong Lin#, Liwei Wang, John E. Hopcroft. Robust Local Features for Improving the Generalization of Adversarial Training. ICLR 2020
文本对抗攻击:
[3] Shuhuai Ren, Yihe Deng, Kun He*, Wanxiang Che. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. ACL (Oral) 2019: 1085-1097.
[4] Xinze Zhang#, Junzhe Zhang#, Zhenhua Chen#, Kun He#*. Crafting Adversarial Examples for Neural Machine Translation. ACL (Oral) 2021: 1967-1977
文本对抗防御:
[1] Xiaosen Wang, Jin Hao, Yichen Yang, Kun He*. Natural language adversarial defense through synonym encoding. UAI 2021: 823-833.
[2] Xiaosen Wang#, Yichen Yang#, Yihe Deng, Kun He*. Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks. AAAI 2021: 13997-14005.
[3] Yichen Yang#, Xiaosen Wang#, Kun He*. Robust textual embedding against word-level adversarial attacks. UAI 2022: 2214-2224.
[4] Xiaosen Wang#, Yifeng Xiong#, Kun He*. Detecting textual adversarial examples through randomized substitution and vote. UAI 2022: 2056-2065.
代表性竞赛成果:
1、2020“安全AI挑战者计划”第2期ImageNet图像分类对抗攻击竞赛冠军;
2、IJCAI 2019国际AI对抗攻防挑战赛防御赛道第3名;
3、CVPR 2021安全AI挑战者计划第6期防御模型的白盒对抗攻击竞赛第4名。