论文标题
方法和一阶算法,用于求解非平滑和非凸凸双线性优化问题
Methodology and first-order algorithms for solving nonsmooth and non-strongly convex bilevel optimization problems
论文作者
论文摘要
简单的二线问题是优化问题,我们希望找到最大程度地减少外部目标函数的内部问题的最佳解决方案。此类问题出现在许多机器学习和信号处理应用中,以消除不良解决方案。但是,由于这些问题无法满足规律性条件,因此通常很难确切解决,并且通常通过迭代正则化解决。在过去的几年中,提出了几种算法来直接解决这些二聚体问题,并为获得可行性提供了速率,假设外部功能强烈凸出。在我们的工作中,我们建议一种新方法,该方法是为简单外部功能(例如$ l_1 $ starm)设计的二重性问题而设计的,这些方法不需要光滑或强烈凸出。在我们的新迭代近似和级别集合方法(ITALEX)方法中,我们在扩展外部函数的级别和近似优化内部问题之间在此级别集合上进行交替。我们表明,通过一阶方法优化内部函数,例如近端梯度和广义条件梯度导致$ O(1/k)$的可行性收敛速率,到目前为止,这是仅通过二元算法实现的速率,用于平滑且强烈地凸出外部功能。此外,我们证明了外部功能的$ O(1/\ sqrt {k})$收敛速率,与现有方法相反,现有方法仅提供渐近保证。我们通过数值实验证明了这种表现。
Simple bilevel problems are optimization problems in which we want to find an optimal solution to an inner problem that minimizes an outer objective function. Such problems appear in many machine learning and signal processing applications as a way to eliminate undesirable solutions. %However, since these problems do not satisfy regularity conditions, they are often hard to solve exactly and are usually solved via iterative regularization. In the past few years, several algorithms were proposed to solve these bilevel problems directly and provide a rate for obtaining feasibility, assuming that the outer function is strongly convex. In our work, we suggest a new approach that is designed for bilevel problems with simple outer functions, such as the $l_1$ norm, which are not required to be either smooth or strongly convex. In our new ITerative Approximation and Level-set EXpansion (ITALEX) approach, we alternate between expanding the level-set of the outer function and approximately optimizing the inner problem over this level-set. We show that optimizing the inner function through first-order methods such as proximal gradient and generalized conditional gradient results in a feasibility convergence rate of $O(1/k)$, which up to now was a rate only achieved by bilevel algorithms for smooth and strongly convex outer functions. Moreover, we prove an $O(1/\sqrt{k})$ rate of convergence for the outer function, contrary to existing methods, which only provide asymptotic guarantees. We demonstrate this performance through numerical experiments.