加速马尔可夫梯度下降的收敛速率，并在增强学习中应用

论文标题

加速马尔可夫梯度下降的收敛速率，并在增强学习中应用

Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning

论文作者

Doan, Thinh T., Nguyen, Lam M., Pham, Nhan H., Romberg, Justin

论文摘要

在机器学习中的广泛应用中，我们研究了流行的加速随机梯度下降（ASGD）算法，用于解决（可能是非convex）优化问题。当从马尔可夫过程中采样梯度时，我们表征了该方法的有限时间性能，因此从时间步长到时间步长偏见和依赖；相比之下，现有工作的分析在很大程度上取决于随机梯度是独立的，有时是公正的。我们的主要贡献表明，在对基础马尔可夫链中产生梯度的某些（标准）假设下，ASGD以与独立梯度样品的相同速率收敛于几乎相同的速率。唯一的区别是一个对数因素，它解释了马尔可夫链的混合时间。这项研究的关键动机之一是复杂的控制问题，可以通过马尔可夫决策过程对其进行建模，并使用强化学习解决。我们将加速方法应用于Openai Gym和Mujoco的几个具有挑战性的问题，并表明加速度可以显着提高经典的时间差异学习和增强算法的性能。

Motivated by broad applications in machine learning, we study the popular accelerated stochastic gradient descent (ASGD) algorithm for solving (possibly nonconvex) optimization problems. We characterize the finite-time performance of this method when the gradients are sampled from Markov processes, and hence biased and dependent from time step to time step; in contrast, the analysis in existing work relies heavily on the stochastic gradients being independent and sometimes unbiased. Our main contributions show that under certain (standard) assumptions on the underlying Markov chain generating the gradients, ASGD converges at the nearly the same rate with Markovian gradient samples as with independent gradient samples. The only difference is a logarithmic factor that accounts for the mixing time of the Markov chain. One of the key motivations for this study are complicated control problems that can be modeled by a Markov decision process and solved using reinforcement learning. We apply the accelerated method to several challenging problems in the OpenAI Gym and Mujoco, and show that acceleration can significantly improve the performance of the classic temporal difference learning and REINFORCE algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题