训练单晶神经元中的大小和角度动力学

论文标题

训练单晶神经元中的大小和角度动力学

Magnitude and Angle Dynamics in Training Single ReLU Neurons

论文作者

Lee, Sangmin, Sim, Byeongsu, Ye, Jong Chul

论文摘要

为了了解深度relu网络的动力学，我们通过将其分解为级级$ W（t）$和角度$ ϕ（t）（t）：=π-θ（t）$组件来研究梯度流量$ W（t）$的动态系统。特别是，对于具有球形对称数据分布和平方损耗函数的多层单晶元神经元，我们为大小和角度成分提供了上限和下限，以描述梯度流动的动力学。使用所获得的边界，我们得出结论，小规模初始化会导致深度单神经元的缓慢收敛速度。最后，通过利用梯度流和梯度下降的关系，我们将结果扩展到梯度下降方法。所有理论结果均通过实验验证。

To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $ϕ(t):= π- θ(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题