论文标题
随机递归梯度下降,用于随机非convex-rong-concove minimax问题
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
论文作者
论文摘要
我们考虑$ \ min _ {\ bf x} \ max _ {\ bf y \ in {\ Mathcal y}} f({\ bf x},{\ bf y})$ f $ f $ f $ y $ bff( x $和$ {\ Mathcal y} $是凸面和紧凑的集合。我们专注于随机环境,在此环境中,我们只能在每次迭代中访问$ f $的无偏随机梯度估计。该公式包括许多机器学习应用程序,例如特殊情况,例如强大的优化和对手培训。我们有兴趣查找$ {\ Mathcal O}(\ Varepsilon)$ - 函数的固定点$φ(\ cdot)= \ max _ {\ bf y \ in {\ mathcal y}}} f(\ cdot,\ cdot,{\ bf y})$。解决此问题的最流行算法是随机梯度不错的上升,它需要$ \ MATHCAL O(κ^3 \ varepsilon^{ - 4})$随机梯度评估,其中$κ$是条件数量。在本文中,我们提出了一种称为随机递归梯度下降(SREDA)的新方法,该方法使用降低方差更有效地估算了梯度。此方法达到了$ {\ Mathcal O}(κ^3 \ Varepsilon^{ - 3})$的最著名随机梯度复杂性,并且其对$ \ varepsilon $的依赖性对于此问题是最佳的。
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $Φ(\cdot)=\max_{\bf y\in{\mathcal Y}} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(κ^3\varepsilon^{-4})$ stochastic gradient evaluations, where $κ$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(κ^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem.