将Barzilai-Borwein自适应步骤大小纳入用于深网训练的浓度方法

论文标题

将Barzilai-Borwein自适应步骤大小纳入用于深网训练的浓度方法

Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training

论文作者

Robles-Kelly, Antonio, Nazari, Asef

论文摘要

在本文中，我们将Barzilai-Borwein的步骤大小融合到用于训练深网络的梯度下降方法中。这使我们能够使用两点近似来调整学习率，以适应准牛顿方法所基于的secant方程。此外，此处介绍的自适应学习率方法本质上是相当一般的，可以应用于广泛使用的梯度下降方法，例如Adagrad和Rmsprop。我们使用标准示例网络体系结构在广泛可用的数据集上评估我们的方法，并将与文献其他地方的替代方案进行比较。在我们的实验中，我们的自适应学习率显示出比替代方案表现出的更顺畅，更快的融合，具有更好或可比的性能。

In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题