论文标题

在神经网络功能的高对称性上

On the High Symmetry of Neural Network Functions

论文作者

Michelucci, Umberto

论文摘要

训练神经网络意味着解决高维优化问题。通常,目标是最大程度地减少取决于所谓的网络函数的损耗函数,或者换句话说,给定给定输入的网络输出的函数。此功能取决于大量参数,也称为权重,取决于网络体系结构。通常,此优化问题的目标是找到网络函数的全局最小值。在本文中,讨论了如何设计神经网络,神经网络功能在参数空间中呈现出非常大的对称性。这项工作表明了神经网络函数如何具有许多等效的最小值,换句话说,最小值为损耗函数和相同的精确输出具有相同的值,这些值随着每个层中的神经元数量而生长,用于馈电神经网络或卷积神经网络中的过滤器数量。当神经元和层的数量较大时,等效的最小值的数量非常快。当然,这将对神经网络在培训过程中如何汇聚给最小值的研究会产生后果。该结果是已知的,但在本文中首次提出了适当的数学讨论,并对等效最小值的数量进行了估计。

Training neural networks means solving a high-dimensional optimization problem. Normally the goal is to minimize a loss function that depends on what is called the network function, or in other words the function that gives the network output given a certain input. This function depends on a large number of parameters, also known as weights, that depends on the network architecture. In general the goal of this optimization problem is to find the global minimum of the network function. In this paper it is discussed how due to how neural networks are designed, the neural network function present a very large symmetry in the parameter space. This work shows how the neural network function has a number of equivalent minima, in other words minima that give the same value for the loss function and the same exact output, that grows factorially with the number of neurons in each layer for feed forward neural network or with the number of filters in a convolutional neural networks. When the number of neurons and layers is large, the number of equivalent minima grows extremely fast. This will have of course consequences for the study of how neural networks converges to minima during training. This results is known, but in this paper for the first time a proper mathematical discussion is presented and an estimate of the number of equivalent minima is derived.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源