论文标题

Learn2Perturb:端到端功能扰动学习以改善对抗性鲁棒性

Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

论文作者

Jeddi, Ahmadreza, Shafiee, Mohammad Javad, Karg, Michelle, Scharfenberger, Christian, Wong, Alexander

论文摘要

虽然深层神经网络一直在各种应用程序中实现最先进的性能,但它们对对抗性攻击的脆弱性限制了其广泛部署的安全至关重要应用。除了调查其他对抗性防御方法外,最近有兴趣通过在培训过程中引入扰动来改善深层神经网络的对抗性鲁棒性。但是,这种方法利用了固定的,预定的扰动,需要大量的超参数调整,这使得它们很难以一般的方式利用。在这项研究中,我们介绍了Learn2Perturb,这是一种端到端特征扰动学习方法,用于改善深神经网络的对抗性鲁棒性。更具体地说,我们介绍了在每一层中合并的新型扰动注入模块,以扰动特征空间并增加网络中的不确定性。此功能扰动均在训练和推理阶段进行。此外,受到期望最大化的启发,引入了交替的背部传播训练算法,以连续训练网络和噪声参数。 CIFAR-10和CIFAR-100数据集的实验结果表明,所提出的LEAL2PERTUBS方法可以导致深层神经网络的$ 4-7 \%\%$ $更强大,$ l _ {\ infty} $ fgsm和PDG对抗性攻击,并显着超过了$ $ $ $ $ c $ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c \ c&c的范围均超过了$ c \ c \ c \&c。

While deep neural networks have been achieving state-of-the-art performance across a wide variety of applications, their vulnerability to adversarial attacks limits their widespread deployment for safety-critical applications. Alongside other adversarial defense approaches being investigated, there has been a very recent interest in improving adversarial robustness in deep neural networks through the introduction of perturbations during the training process. However, such methods leverage fixed, pre-defined perturbations and require significant hyper-parameter tuning that makes them very difficult to leverage in a general fashion. In this study, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. More specifically, we introduce novel perturbation-injection modules that are incorporated at each layer to perturb the feature space and increase uncertainty in the network. This feature perturbation is performed at both the training and the inference stages. Furthermore, inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively. Experimental results on CIFAR-10 and CIFAR-100 datasets show that the proposed Learn2Perturb method can result in deep neural networks which are $4-7\%$ more robust on $l_{\infty}$ FGSM and PDG adversarial attacks and significantly outperforms the state-of-the-art against $l_2$ $C\&W$ attack and a wide range of well-known black-box attacks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源