论文标题

ProJunn:具有统一矩阵训练深网的有效方法

projUNN: efficient method for training deep networks with unitary matrices

论文作者

Kiani, Bobak, Balestriero, Randall, LeCun, Yann, Lloyd, Seth

论文摘要

在以经常性或非常深的前馈网络学习时,在每一层中采用单一矩阵可以非常有效地保持长期稳定性。但是,将网络参数限制为单位通常是以昂贵的参数化或增加培训运行时的成本。相反,我们提出了一种基于排名$ K $更新的有效方法 - 或其排名$ K $近似 - 该方法在几乎最佳的培训运行时保持了性能。我们介绍了该方法的两个变体,称为Direct(Projunn-D)和Tangent(Projunn-T)投影的统一神经网络,它们可以通过训练运行时缩放为$ O(kn^2)$参数化完整的$ n $二维单位或正交矩阵。我们的方法要么将低级梯度投射到最接近的单一矩阵(Projunn-T)上,要么将单一矩阵沿低级梯度(Projunn-D)转移。即使在最快的设置($ k = 1 $)中,Projunn也能够训练模型的统一参数,以实现与基线实现相当的性能。在复发性神经网络设置中,Projunn与先前的单一神经网络的基准结果紧密匹配或超过了基准的结果。最后,我们在训练正交卷积神经网络中初步探索Projunn,这些卷积神经网络目前无法超越最先进的模型,但有可能在深度上提高稳定性和鲁棒性。

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full $N$-dimensional unitary or orthogonal matrices with a training runtime scaling as $O(kN^2)$. Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting ($k=1$), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds benchmarked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源