论文标题
GPU加速的原始学习,以进行非常快速的大规模分类
GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification
论文作者
论文摘要
解决L2调查原始问题的最有效方法之一,例如逻辑回归和线性支持向量机(SVM)分类,是广泛使用的信任区域牛顿算法,TRON。尽管最近已显示TRON在共享内存的多核系统上享有很大的加速,但由于算法的高度复杂且重度顺序的性质,利用图形处理单元(GPU)来加快该方法的速度更加困难。在这项工作中,我们表明,使用明智的GPU优化原则,TRON训练时间可能会大大减少。对于稀疏功能集,我们表明,使用GPU训练Liblinear中的Logistic回归分类器的速度比仅使用多线程更快。对于密集的特征集(对这一构成更严格的内存约束),我们表明GPU大大减少了最新的蛋白质组学分析所需的冗长的SVM学习时间,从而对最近提出的加速进行了巨大的改进。此外,我们展示了如何将GPU加速与多线程混合在一起,以在数据集太大而无法满足GPU内存需求时启用此类速度;在近250亿个数据实例的大量密集蛋白质组学数据集中,这些混合结构加速速度将SVM分析时间从半星期减少到不到一天的时间,而使用有限的GPU存储器。
One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets--which impose far more stringent memory constraints--we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups. Furthermore, we show how GPU speedups may be mixed with multithreading to enable such speedups when the dataset is too large for GPU memory requirements; on a massive dense proteomics dataset of nearly a quarter-billion data instances, these mixed-architecture speedups reduce SVM analysis time from over half a week to less than a single day while using limited GPU memory.