神经切线内核可以告诉我们关于对抗性鲁棒性的什么？

论文标题

神经切线内核可以告诉我们关于对抗性鲁棒性的什么？

What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?

论文作者

Tsilivis, Nikolaos, Kempe, Julia

论文摘要

神经网的对抗性脆弱性以及随后创建强大模型的技术引起了极大的关注。然而，我们仍然对这一现象缺乏完全的理解。在这里，我们通过连接神经网络和内核方法的最新理论进步提供的分析工具研究了受过训练的神经网络的对抗示例，即神经切线内核（NTK），此前越来越多的工作利用了NTK近似的工作，以成功地分析了重要的深度学习现象和设计Algorithm，以成功地分析了NTK的近似值。我们展示了NTK如何允许以``无训练''的方式生成对抗性例子，并证明他们在``懒惰''政权中转移以欺骗有限的宽度神经网。我们利用这种连接来提供有关强大和非舒适特征的替代观点，这些特征被建议是神经网的对抗性的基础。具体而言，我们定义和研究特征是由内核的特征分类引起的，以更好地了解健壮和非稳固特征的作用，这对标准分类和稳健性 - 准确性权衡的依赖依赖。我们发现，此类特征在整个架构之间存在惊人的一致性，并且强大的功能往往与模型的最大特征值相对应，因此在训练期间早点学习。我们的框架使我们能够识别和可视化不舒适但有用的功能。 Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training.

The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题