在教师学生环境中，两层恢复神经网络的风险过多及其对内核方法的优势

论文标题

在教师学生环境中，两层恢复神经网络的风险过多及其对内核方法的优势

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

论文作者

Akiyama, Shunta, Suzuki, Taiji

论文摘要

尽管深度学习对各种任务的其他方法都优于其他方法，但解释其原因的理论框架尚未完全确定。为了解决这个问题，我们在教师的回归模型中调查了两层relu神经网络的多余风险，在该模型中，学生网络通过其输出来学习一个未知的教师网络。尤其是，我们考虑的学生网络与教师网络具有相同的宽度，并且经过两个阶段的培训：首先是嘈杂的梯度下降，然后是香草梯度下降。我们的结果表明，学生网络可证明达到了几乎全球的最佳解决方案，并且以最小值最佳速率的意义上，包括神经切线核方法，随机特征模型和其他内核方法，包括神经切线核方法，随机特征模型和其他内核方法，都超过所有内核方法估计器（更一般而言）。引起这种优势的关键概念是神经网络模型的非跨性别。即使损失格局高度非凸面，学生网络也可以适应地学习教师神经元。

While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established. To address this issue, we investigate the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. Especially, we consider the student network that has the same width as the teacher network and is trained in two phases: first by noisy gradient descent and then by the vanilla gradient descent. Our result shows that the student network provably reaches a near-global optimal solution and outperforms any kernel methods estimator (more generally, linear estimators), including neural tangent kernel approach, random feature model, and other kernel methods, in a sense of the minimax optimal rate. The key concept inducing this superiority is the non-convexity of the neural network models. Even though the loss landscape is highly non-convex, the student network adaptively learns the teacher neurons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题