无处可隐藏：针对对抗性示例的轻量无监督探测器

论文标题

无处可隐藏：针对对抗性示例的轻量无监督探测器

Nowhere to Hide: A Lightweight Unsupervised Detector against Adversarial Examples

论文作者

Liu, Hui, Zhao, Bo, Zhang, Kehuan, Liu, Peng

论文摘要

尽管深层神经网络（DNNS）在许多感知任务上表现出令人印象深刻的表现，但它们容易受到对抗性示例的影响，这些例子是通过添加轻微但恶意精心制作的良性图像而产生的。对抗检测是在将对抗性示例输入目标DNN之前识别对抗性示例的重要技术。以前的研究以检测对抗性示例针对特定攻击或需要昂贵的计算。设计轻巧的无监督探测器仍然是一个具有挑战性的问题。在本文中，我们提出了一个基于自动编码器的对抗示例（AEAE）检测器，该示例可以通过以无监督的方式检测低计算的对抗性示例来保护DNN模型。 AEAE仅包含浅自动编码器，但扮演两个角色。首先，训练有素的自动编码器已经了解了良性例子的流行。该自动编码器可以为具有较大扰动的对抗图像产生较大的重建误差，因此我们可以根据重建误差检测到明显的扰动对抗示例。其次，自动编码器可以过滤掉较小的噪声，并用小扰动更改DNN对对抗性示例的预测。它有助于根据预测距离检测略微扰动的对抗示例。为了涵盖这两种情况，我们利用了从良性图像的重建误差和预测距离来构建两键键的特征，并使用隔离森林算法训练对抗检测器。我们从经验上表明，AEAE对最先进的攻击是无监督和便宜的。通过这两种情况下的检测，无处可隐藏对抗性示例。

Although deep neural networks (DNNs) have shown impressive performance on many perceptual tasks, they are vulnerable to adversarial examples that are generated by adding slight but maliciously crafted perturbations to benign images. Adversarial detection is an important technique for identifying adversarial examples before they are entered into target DNNs. Previous studies to detect adversarial examples either targeted specific attacks or required expensive computation. How design a lightweight unsupervised detector is still a challenging problem. In this paper, we propose an AutoEncoder-based Adversarial Examples (AEAE) detector, that can guard DNN models by detecting adversarial examples with low computation in an unsupervised manner. The AEAE includes only a shallow autoencoder but plays two roles. First, a well-trained autoencoder has learned the manifold of benign examples. This autoencoder can produce a large reconstruction error for adversarial images with large perturbations, so we can detect significantly perturbed adversarial examples based on the reconstruction error. Second, the autoencoder can filter out the small noise and change the DNN's prediction on adversarial examples with small perturbations. It helps to detect slightly perturbed adversarial examples based on the prediction distance. To cover these two cases, we utilize the reconstruction error and prediction distance from benign images to construct a two-tuple feature set and train an adversarial detector using the isolation forest algorithm. We show empirically that the AEAE is unsupervised and inexpensive against the most state-of-the-art attacks. Through the detection in these two cases, there is nowhere to hide adversarial examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题