论文标题
TextDecepter:硬标签黑匣子攻击文本分类器
TextDecepter: Hard Label Black Box Attack on Text Classifiers
论文作者
论文摘要
事实证明,机器学习容易受到精心制作的样品的影响,称为对抗性例子。这些对抗性示例的产生有助于使模型更加强大,并使我们深入了解这些模型的基本决策。多年来,研究人员在白色和黑色盒子设置中成功攻击了图像分类器。但是,由于文本数据是离散的,这些方法并不直接适用于文本。近年来,针对文本应用程序制定对抗性示例的研究一直在上升。在本文中,我们提出了一种针对自然语言处理(NLP)分类器的硬标签黑盒攻击的新方法,该方法没有披露模型信息,并且攻击者只能查询模型以获得分类器的最终决定,而没有涉及的类别的信心。这种攻击场景适用于用于对安全敏感应用程序(例如情感分析和有毒内容检测)的现实世界黑框模型。
Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarial examples. The generation of these adversarial examples helps to make the models more robust and gives us an insight into the underlying decision-making of these models. Over the years, researchers have successfully attacked image classifiers in both, white and black-box settings. However, these methods are not directly applicable to texts as text data is discrete. In recent years, research on crafting adversarial examples against textual applications has been on the rise. In this paper, we present a novel approach for hard-label black-box attacks against Natural Language Processing (NLP) classifiers, where no model information is disclosed, and an attacker can only query the model to get a final decision of the classifier, without confidence scores of the classes involved. Such an attack scenario applies to real-world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection.