深神经网络重量矩阵的随机矩阵分析

论文标题

深神经网络重量矩阵的随机矩阵分析

Random matrix analysis of deep neural network weight matrices

论文作者

Thamm, Matthias, Staats, Max, Rosenow, Bernd

论文摘要

神经网络已成功地用于各种领域，这引起了人们对他们如何存储如何存储执行特定任务所需信息的理论理解的极大兴趣。我们使用随机矩阵理论（RMT）的方法研究了训练的深神经网络的重量矩阵，并表明大多数奇异值的统计数据遵循通用RMT预测。这表明它们是随机的，并且不包含系统特定信息，我们通过将特征向量入口的统计数据与通用波特 - 托马斯分布进行比较，进一步研究了这些信息。我们发现，对于大多数特征向量，无法拒绝随机性的假设，并且只有属于最大奇异值的特征向量偏离RMT预测，这表明它们可能会编码学习的信息。此外，与RMT预测的比较还允许区分接受不同学习方案训练的网络 - 从懒惰到丰富的学习。我们使用山丘估计器分析了大奇异值的光谱分布，并发现该分布通常不能以尾部指数为特征，即不是幂律类型。

Neural networks have been used successfully in a variety of fields, which has led to a great deal of interest in developing a theoretical understanding of how they store the information needed to perform a particular task. We study the weight matrices of trained deep neural networks using methods from random matrix theory (RMT) and show that the statistics of most of the singular values follow universal RMT predictions. This suggests that they are random and do not contain system specific information, which we investigate further by comparing the statistics of eigenvector entries to the universal Porter-Thomas distribution. We find that for most eigenvectors the hypothesis of randomness cannot be rejected, and that only eigenvectors belonging to the largest singular values deviate from the RMT prediction, indicating that they may encode learned information. In addition, a comparison with RMT predictions also allows to distinguish networks trained in different learning regimes - from lazy to rich learning. We analyze the spectral distribution of the large singular values using the Hill estimator and find that the distribution cannot in general be characterized by a tail index, i.e. is not of power law type.

下载PDF全文

下载文献需遵守相关版权规定

论文标题