防御对深神经网络的后门攻击

论文标题

防御对深神经网络的后门攻击

Defending against Backdoor Attack on Deep Neural Networks

论文作者

Cheng, Hao, Xu, Kaidi, Liu, Sijia, Chen, Pin-Yu, Zhao, Pu, Lin, Xue

论文摘要

尽管深层神经网络（DNNS）在各种计算机视觉任务中取得了巨大的成功，但最近发现它们容易受到对抗攻击的影响。在本文中，我们专注于所谓的\ textit {后门攻击}，该{后门攻击}向一小部分培训数据（也称为数据中毒）注入后门触发器，以使训练有素的DNN在面对示例时引起了错误的分类。具体来说，我们仔细研究了真实和合成后门攻击对通过Gard-Cam镜头的香草和后式DNN的内部响应的影响。此外，我们表明，与其$ \ ell_1 $和$ \ ell_2 $ norm的$ \ ell_ \ infty $ norm有关，后门攻击会引起神经元激活的显着偏见。在我们的结果中，我们提出了\ textIt {$ \ ell_ \ infty $的神经元修剪}，以从后门DNN中删除后门。实验表明，我们的方法可以有效地降低攻击成功率，并且还具有清洁图像的高分类精度。

Although deep neural networks (DNNs) have achieved a great success in various computer vision tasks, it is recently found that they are vulnerable to adversarial attacks. In this paper, we focus on the so-called \textit{backdoor attack}, which injects a backdoor trigger to a small portion of training data (also known as data poisoning) such that the trained DNN induces misclassification while facing examples with this trigger. To be specific, we carefully study the effect of both real and synthetic backdoor attacks on the internal response of vanilla and backdoored DNNs through the lens of Gard-CAM. Moreover, we show that the backdoor attack induces a significant bias in neuron activation in terms of the $\ell_\infty$ norm of an activation map compared to its $\ell_1$ and $\ell_2$ norm. Spurred by our results, we propose the \textit{$\ell_\infty$-based neuron pruning} to remove the backdoor from the backdoored DNN. Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题