论文标题
凝结的内点方法:在GPU硬件上移植缩小空间方法
Condensed interior-point methods: porting reduced-space approaches on GPU hardware
论文作者
论文摘要
内点方法(IPM)已成为非线性编程的主力方法。 IPM的性能与用于分解算法的每次迭代时使用的线性求解器直接相关。在解决大规模的非线性问题时,最先进的IPM求解器依靠有效的稀疏线性求解器来求解KKT系统。取而代之的是,我们提出了一种新颖的空间IPM算法,该算法将KKT系统凝结成一个密集的矩阵,其大小与问题中的自由度成正比。根据还原发生的位置,我们得出了缩小空间方法的两个变体:线性化 - 然后还原和减少线性化。我们适应了他们的工作流程,以便在GPU上加速了绝大多数计算。我们就最佳功率流问题提供了广泛的数值结果,将我们的GPU加速缩小的Space IPM与Knitro和Hybrid Full Space IPM算法进行了比较。通过评估GPU上的衍生物并求解CPU上的KKT系统,混合溶液已经明显快于仅CPU溶液。通过完全在GPU上求解KKT系统,这两种缩小的空间算法进一步迈进了一步。正如预期的那样,两种还原算法的性能本质上取决于可用自由度的数量:当问题具有多个自由度时,它们的性能很差,但是两种算法的相对自由度的相对程度变小,两种算法的速度比Knitro快3倍。
The interior-point method (IPM) has become the workhorse method for nonlinear programming. The performance of IPM is directly related to the linear solver employed to factorize the Karush--Kuhn--Tucker (KKT) system at each iteration of the algorithm. When solving large-scale nonlinear problems, state-of-the art IPM solvers rely on efficient sparse linear solvers to solve the KKT system. Instead, we propose a novel reduced-space IPM algorithm that condenses the KKT system into a dense matrix whose size is proportional to the number of degrees of freedom in the problem. Depending on where the reduction occurs we derive two variants of the reduced-space method: linearize-then-reduce and reduce-then-linearize. We adapt their workflow so that the vast majority of computations are accelerated on GPUs. We provide extensive numerical results on the optimal power flow problem, comparing our GPU-accelerated reduced space IPM with Knitro and a hybrid full space IPM algorithm. By evaluating the derivatives on the GPU and solving the KKT system on the CPU, the hybrid solution is already significantly faster than the CPU-only solutions. The two reduced-space algorithms go one step further by solving the KKT system entirely on the GPU. As expected, the performance of the two reduction algorithms depends intrinsically on the number of available degrees of freedom: their performance is poor when the problem has many degrees of freedom, but the two algorithms are up to 3 times faster than Knitro as soon as the relative number of degrees of freedom becomes smaller.