论文标题
为什么对对抗性扰动不可察觉?重新考虑对抗性NLP中的研究范式
Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP
论文作者
论文摘要
文本对抗样本在NLP研究的多个子场中起重要作用,包括安全性,评估,解释性和数据增强。但是,大多数工作都将所有这些角色混合在一起,掩盖了旨在揭示NLP模型实际关注的安全角色的问题定义和研究目标。在本文中,我们重新考虑了安全方案中文本对抗样本的研究范例。我们讨论了先前工作中的缺陷,并提出了我们的建议,即对面向安全的对抗性NLP(SOADNLP)的研究应:(1)评估他们对安全任务的方法,以证明现实世界中的关注; (2)考虑实际攻击者的目标,而不是开发不切实际的方法。为此,我们首先收集,处理和发布安全数据集集合advbench。然后,我们将任务重新调整,并调整对Soadnlp不同目标的强调。接下来,我们提出了一种基于启发式规则的简单方法,该方法可以轻松实现实际的对抗目标,以模拟现实世界中的攻击方法。我们对Advbench的攻击和防御方面进行了实验。实验结果表明,我们的方法具有更高的实践价值,表明SOADNLP中的研究范式可能从我们的新基准开始。可以通过\ url {https://github.com/thunlp/advbench}获得AdvBench的所有代码和数据。
Textual adversarial samples play important roles in multiple subfields of NLP research, including security, evaluation, explainability, and data augmentation. However, most work mixes all these roles, obscuring the problem definitions and research goals of the security role that aims to reveal the practical concerns of NLP models. In this paper, we rethink the research paradigm of textual adversarial samples in security scenarios. We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods. To this end, we first collect, process, and release a security datasets collection Advbench. Then, we reformalize the task and adjust the emphasis on different goals in SoadNLP. Next, we propose a simple method based on heuristic rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods. We conduct experiments on both the attack and the defense sides on Advbench. Experimental results show that our method has higher practical value, indicating that the research paradigm in SoadNLP may start from our new benchmark. All the code and data of Advbench can be obtained at \url{https://github.com/thunlp/Advbench}.