论文标题
朝着可靠的神经规范
Towards Reliable Neural Specifications
论文作者
论文摘要
具有可靠的规格是实现AI系统的可验证正确性,鲁棒性和解释性的不可避免的挑战。神经网络的现有规格是数据范式作为规范。也就是说,以参考输入为中心的本地邻里被认为是正确的(或稳定)。尽管现有的规格有助于验证对抗性鲁棒性,但在许多研究领域中的一个重大问题,我们的经验研究表明,这些经过验证的区域有些紧张,因此无法验证测试集输入的输入,使其对某些现实世界中的应用不切实际。为此,我们提出了一个称为神经表示形式的新规格家族,它使用神经网络的内在信息 - 神经激活模式(NAP),而不是输入数据来指定神经网络预测的正确性和鲁棒性。我们提出了一种简单的统计方法,用于采矿神经激活模式。为了显示发现的小睡的有效性,我们正式验证了几种重要的特性,例如,给定的午睡中永远不会发生各种类型的错误分类,并且不同的小睡之间没有歧义。我们表明,通过使用NAP,我们可以验证输入空间的重要区域,同时仍回忆了84%的MNIST数据。此外,我们可以在CIFAR10基准测试上将可验证的绑定到大10倍。因此,我们认为,小睡可能被用作神经网络验证的更可靠和可扩展的规范。
Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification, which uses the intrinsic information of neural networks - neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.