论文标题
由对抗攻击引起的表示形式的偏差
Deviations in Representations Induced by Adversarial Attacks
论文作者
论文摘要
深度学习一直是一个流行的话题,并且在许多领域都取得了成功。它引起了研究人员和机器学习从业人员的注意,开发的模型部署在各种环境中。随着其成就,研究表明,深度学习模型容易受到对抗性攻击的影响。这一发现为研究带来了新的方向,从而开发了算法来攻击和捍卫脆弱的网络。我们的兴趣是了解这些攻击如何影响深度学习模型的中间表示。我们提出了一种测量和分析由对抗攻击引起的表示偏差的方法,该偏差逐渐在选定的一组层上逐步进行。实验是使用CIFAR-10数据集上的各种攻击算法进行的,并创建了图,以可视化网络中不同层的对抗性攻击的影响。
Deep learning has been a popular topic and has achieved success in many areas. It has drawn the attention of researchers and machine learning practitioners alike, with developed models deployed to a variety of settings. Along with its achievements, research has shown that deep learning models are vulnerable to adversarial attacks. This finding brought about a new direction in research, whereby algorithms were developed to attack and defend vulnerable networks. Our interest is in understanding how these attacks effect change on the intermediate representations of deep learning models. We present a method for measuring and analyzing the deviations in representations induced by adversarial attacks, progressively across a selected set of layers. Experiments are conducted using an assortment of attack algorithms, on the CIFAR-10 dataset, with plots created to visualize the impact of adversarial attacks across different layers in a network.