对抗性攻击和结构化预测模型的防御

论文标题

对抗性攻击和结构化预测模型的防御

Adversarial Attack and Defense of Structured Prediction Models

论文作者

Han, Wenjuan, Zhang, Liwen, Jiang, Yong, Tu, Kewei

论文摘要

建立有效的对抗性攻击者，并详细说明对自然语言处理的对抗性攻击（NLP）的对策吸引了大量研究。但是，大多数现有方法都集中在分类问题上。在本文中，我们研究了NLP中结构化预测任务的攻击和防御。除了在任何NLP任务中攻击者面临的扰动词和句子流利问题的难度外，对结构化预测模型的攻击者都有特定的挑战：结构化预测模型的结构化输出对输入中的小扰动很敏感。为了解决这些问题，我们提出了一个新颖而统一的框架，该框架学会使用序列到序列模型来攻击结构化的预测模型，并通过来自相同结构化预测任务的多个参考模型的反馈。根据拟议的攻击，我们通过对抗训练进一步加强了受害者模型，从而使其预测更加稳健和准确。我们评估了依赖性解析和言论部分标记中提出的框架。自动和人类评估表明，我们提出的框架在攻击最先进的结构化预测模型和对抗性训练方面都取得了成功。

Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on classification problems. In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. Besides the difficulty of perturbing discrete words and the sentence fluency problem faced by attackers in any NLP tasks, there is a specific challenge to attackers of structured prediction models: the structured output of structured prediction models is sensitive to small perturbations in the input. To address these problems, we propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model with feedbacks from multiple reference models of the same structured prediction task. Based on the proposed attack, we further reinforce the victim model with adversarial training, making its prediction more robust and accurate. We evaluate the proposed framework in dependency parsing and part-of-speech tagging. Automatic and human evaluations show that our proposed framework succeeds in both attacking state-of-the-art structured prediction models and boosting them with adversarial training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题