通过交替的最小化方法，训练深神经网络的收敛速率

论文标题

通过交替的最小化方法，训练深神经网络的收敛速率

Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

论文作者

Xu, Jintao, Bao, Chenglong, Xing, Wenxun

论文摘要

训练深神经网络（DNNS）是机器学习中的一个重要且具有挑战性的优化问题，由于其非凸度和不可分割的结构。交替的最小化方法（AM）方法分解了DNN的组成结构，并引起了深度学习和优化社区的极大兴趣。在本文中，我们提出了一个统一的框架，用于分析AM型网络培训方法的收敛速率。我们的分析基于非主持酮$ j $步骤的足够减少条件和Kurdyka-lojasiewicz（KL）属性，该属性放松了设计下降算法的要求。如果KL指数$θ$在$ [0,1）$方面显示详细的本地收敛速率。此外，在较强的$ j $步骤中讨论了本地R线性收敛。

Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $θ$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题