热与模糊：针对对抗性例子的有效而快速的防御

论文标题

热与模糊：针对对抗性例子的有效而快速的防御

Heat and Blur: An Effective and Fast Defense Against Adversarial Examples

论文作者

Brama, Haya, Grinshpoun, Tal

论文摘要

人工神经网络（NNS）不断增长地纳入许多领域，尤其是至关重要的生活系统，受到其易受对抗性例子的脆弱性（AES）的限制。一些现有的防御方法可以提高NNS的鲁棒性，但是它们通常需要特殊的建筑或培训程序，并且与已经训练的模型无关。在本文中，我们提出了一个简单的防御，将特征可视化与输入修改结合在一起，因此可以适用于各种预训练的网络。通过审查几种可解释性方法，我们就有关AES对NNS计算的影响的新见解获得了新的见解。基于此，我们假设有关“真”对象的信息也保留在NN的活动中，即使输入是对抗性的，并提出了一个功能可视化版本，该功能可视化版本可以以相关热图的形式提取该信息。然后，我们将这些热图作为我们的防御的基础，在这种防御中，对抗性效应被巨大的模糊破坏。我们还提供了一个新的评估指标，可以更彻底和描述性地捕获攻击和防御措施的影响，并在Imagenet数据集中使用VGG19结果证明防御的有效性以及建议的评估测量结果。

The growing incorporation of artificial neural networks (NNs) into many fields, and especially into life-critical systems, is restrained by their vulnerability to adversarial examples (AEs). Some existing defense methods can increase NNs' robustness, but they often require special architecture or training procedures and are irrelevant to already trained models. In this paper, we propose a simple defense that combines feature visualization with input modification, and can, therefore, be applicable to various pre-trained networks. By reviewing several interpretability methods, we gain new insights regarding the influence of AEs on NNs' computation. Based on that, we hypothesize that information about the "true" object is preserved within the NN's activity, even when the input is adversarial, and present a feature visualization version that can extract that information in the form of relevance heatmaps. We then use these heatmaps as a basis for our defense, in which the adversarial effects are corrupted by massive blurring. We also provide a new evaluation metric that can capture the effects of both attacks and defenses more thoroughly and descriptively, and demonstrate the effectiveness of the defense and the utility of the suggested evaluation measurement with VGG19 results on the ImageNet dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题