论文标题
弗兰克·沃尔夫(Frank Wolfe)遇到公制熵
Frank Wolfe Meets Metric Entropy
论文作者
论文摘要
Frank-Wolfe算法由于能够有效解决机器学习和高维统计的优化问题的能力而引起了人们的流行。因此,建立算法何时具有“线性” $ O(\ log(1/ε))$无尺寸迭代复杂性与投影梯度下降相媲美的何时存在兴趣。 在本文中,我们提供了一种通用技术,用于使用该域的度量熵来建立Frank-Wolfe及其变体的特定域且易于估计的下限。最值得注意的是,我们表明,不仅在最坏的情况下,而且在\ emph {平均情况}中必须失败。 $ O(1/d)$错误约束,概率很高。我们还为核定球建立了这种现象。 与度量熵的链接还对统计中的条件梯度算法(例如梯度提升和匹配追求)具有有趣的积极含义。特别是,我们表明可以直接从对基础优化程序的分析中提取多余风险的快速付费上限。
The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently solve constrained optimization problems in machine learning and high-dimensional statistics. As such, there is much interest in establishing when the algorithm may possess a "linear" $O(\log(1/ε))$ dimension-free iteration complexity comparable to projected gradient descent. In this paper, we provide a general technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe and its variants using the metric entropy of the domain. Most notably, we show that a dimension-free linear upper bound must fail not only in the worst case, but in the \emph{average case}: for a Gaussian or spherical random polytope in $\mathbb{R}^d$ with $\mathrm{poly}(d)$ vertices, Frank-Wolfe requires up to $\tildeΩ(d)$ iterations to achieve a $O(1/d)$ error bound, with high probability. We also establish this phenomenon for the nuclear norm ball. The link with metric entropy also has interesting positive implications for conditional gradient algorithms in statistics, such as gradient boosting and matching pursuit. In particular, we show that it is possible to extract fast-decaying upper bounds on the excess risk directly from an analysis of the underlying optimization procedure.