论文标题
为什么很难在深度学习中进行强大的概括:表达能力的观点
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
论文作者
论文摘要
众所周知,现代神经网络容易受到对抗性例子的影响。为了减轻这个问题,已经提出了一系列强大的学习算法。但是,尽管通过某些方法可以通过某些方法接近零的训练误差,但所有现有算法都会导致较高的鲁棒概括误差。在本文中,我们从深层神经网络的表达能力的角度从表达能力的角度提供了对这种令人困惑的现象的理论理解。具体而言,对于二进制分类数据的二进制分类问题,我们表明,对于Relu网络,虽然轻度的过度参数化足以满足较高的鲁棒训练准确性,但在数据维度$ D $ d $中,除非神经网络的大小是指数级的,否则存在持续的鲁棒概括差距。即使数据是线性可分离的(这意味着实现标准概括很容易),并且更一般地,对于任何参数化函数类别,只要其VC维度最多在参数的数量中是多项式的,则该结果仍然存在。此外,我们为网络大小建立了$ \ exp({\ mathcal {o}}(k))$的改进的上限,当数据置于具有内在尺寸$ k $($ k \ ll d $)的歧管上时,以实现较低的鲁棒概括错误。尽管如此,我们也有一个相对于$ k $成倍增长的下限 - 维度的诅咒是不可避免的。通过证明网络大小之间的指数分离以实现较低的鲁棒训练和概括错误,我们的结果表明,鲁棒概括的硬度可能源于实用模型的表现力。
It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. This result holds even if the data is linear separable (which means achieving standard generalization is easy), and more generally for any parameterized function classes as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.