论文标题
通过近似集合边界的对抗检测
Adversarial Detection by Approximation of Ensemble Boundary
论文作者
论文摘要
尽管在许多应用领域有效,但深层神经网络(DNN)容易受到攻击。在对象识别中,攻击采取了添加到图像中的小扰动的形式,这会导致DNN错误分类,但对人类来说似乎没有什么不同。对抗性攻击导致防御本身可能受到攻击,攻击/防御策略提供了有关DNN属性的重要信息。在本文中,提出了一种检测对抗性攻击的新方法,用于解决深度神经网络(DNNS)的合奏,以解决两级模式识别问题。使用能够近似布尔函数并控制决策边界复杂性的沃尔什系数组合合奏。本文中的假设是,具有较高曲率的决策边界可以找到对抗性扰动,但会改变决策边界的曲率,然后与干净的图像相比,沃尔什系数以不同的方式近似。除了控制边界复杂性外,系数还测量了与类标签的相关性,这可能有助于理解DNN的学习和可传递性能。尽管这里的实验使用图像,但原则上可以将建模两类集合决策边界建模的建议方法应用于任何应用领域。
Despite being effective in many application areas, Deep Neural Networks (DNNs) are vulnerable to being attacked. In object recognition, the attack takes the form of a small perturbation added to an image, that causes the DNN to misclassify, but to a human appears no different. Adversarial attacks lead to defences that are themselves subject to attack, and the attack/ defence strategies provide important information about the properties of DNNs. In this paper, a novel method of detecting adversarial attacks is proposed for an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The ensemble is combined using Walsh coefficients which are capable of approximating Boolean functions and thereby controlling the decision boundary complexity. The hypothesis in this paper is that decision boundaries with high curvature allow adversarial perturbations to be found, but change the curvature of the decision boundary, which is then approximated in a different way by Walsh coefficients compared to the clean images. Besides controlling boundary complexity, the coefficients also measure the correlation with class labels, which may aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area.