论文标题
自动,动态和几乎最佳的学习率规范,通过局部二次近似
Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation
论文作者
论文摘要
在深度学习任务中,学习率决定了每次迭代的更新步长,这在基于梯度的优化中起着至关重要的作用。但是,实践中适当的学习率的确定通常会根据主观判断回答。在这项工作中,我们提出了一种基于局部二次近似(LQA)的新型优化方法。在每个更新步骤中,考虑到梯度方向,我们通过学习率的标准二次函数在本地近似损失函数。然后,我们提出了一个近似步骤,以以计算有效的方式获得几乎最佳的学习率。提出的LQA方法具有三个重要特征。首先,在每个更新步骤中自动确定学习率。其次,它根据当前损耗函数值和参数估计进行动态调整。第三,随着梯度方向固定,提出的方法在损失函数方面几乎最大的减少。已经进行了广泛的实验来证明所提出的LQA方法的优势。
In deep learning tasks, the learning rate determines the update step size in each iteration, which plays a critical role in gradient-based optimization. However, the determination of the appropriate learning rate in practice typically replies on subjective judgement. In this work, we propose a novel optimization method based on local quadratic approximation (LQA). In each update step, given the gradient direction, we locally approximate the loss function by a standard quadratic function of the learning rate. Then, we propose an approximation step to obtain a nearly optimal learning rate in a computationally efficient way. The proposed LQA method has three important features. First, the learning rate is automatically determined in each update step. Second, it is dynamically adjusted according to the current loss function value and the parameter estimates. Third, with the gradient direction fixed, the proposed method leads to nearly the greatest reduction in terms of the loss function. Extensive experiments have been conducted to prove the strengths of the proposed LQA method.