深神经网络的扩展关键制度

论文标题

深神经网络的扩展关键制度

Extended critical regimes of deep neural networks

论文作者

Qu, Cheng Kevin, Wardak, Asem, Gong, Pulin

论文摘要

深层神经网络（DNN）已成功地应用于许多现实世界中的问题，但是仍然缺乏对它们的动态和计算原理的完全理解。用于分析DNN的常规理论框架通常会假设与权重遵循高斯统计的随机网络。但是，非高斯，重尾耦合是DNNS中普遍存在的现象。在这里，通过将重尾随机矩阵和非平衡统计物理的理论编织在一起，我们为DNNS开发了一种新型的平均野外理论，该理论预测重尾重量可以使得无需微调参数的扩展关键方案的出现。在这个扩展的批判性方案中，DNNS跨层表现出丰富而复杂的传播动力学。我们进一步阐明，扩展的关键性赋予DNN具有深刻的计算优势：平衡收缩和扩展内部神经表示并加快训练过程，因此为有效的神经建筑设计提供了理论指南。

Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing the contraction as well as expansion of internal neural representations and speeding up training processes, hence providing a theoretical guide for the design of efficient neural architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题