RSC：通过随机稀疏计算加速图神经网络训练

论文标题

RSC：通过随机稀疏计算加速图神经网络训练

RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations

论文作者

Liu, Zirui, Chen, Shengyuan, Zhou, Kaixiong, Zha, Daochen, Huang, Xiao, Hu, Xia

论文摘要

图形神经网络（GNN）的训练非常耗时，因为很难通过硬件加速基于图形的操作。先前的ART探讨了通过基于抽样的近似值来降低时间复杂性的交易，以降低时间复杂性。基于这个想法，以前的作品成功地加速了基于密集的基质操作（例如，卷积和线性），精度可以忽略不计。但是，与密集的矩阵不同，稀疏矩阵以不规则的数据格式存储，因此每个行/列可能具有不同数量的非零条目。因此，与密集的对应物相比，近似稀疏操作有两个独特的挑战（1）我们不能直接控制近似稀疏操作的效率，因为该计算仅在非零条目上执行；（2）由于数据格式不规则，子采样稀疏矩阵效率更低。为了解决这些问题，我们的关键思想是通过优化计算资源分配层和时代，控制准确性效率的权衡。具体来说，对于第一个挑战，我们将计算资源自定义为不同的稀疏操作，而将使用的总资源限制在一定预算以下。对于第二项挑战，我们缓存了先前的采样稀疏矩阵，以减少时期的抽样开销。最后，我们提出了一种开关机制，以改善接受近似操作训练的GNN的概括。为此，我们提出了随机稀疏计算，这首先证明了具有近似操作的训练GNN的潜力。实际上，RSC可以实现多达$ 11.6 \ times $ $速度，而单个稀疏操作和$ 1.6 \ times $ $端到端的墙壁通行时间加速，而准确度可忽略不计。

The training of graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by hardware. Prior art explores trading off the computational precision to reduce the time complexity via sampling-based approximation. Based on the idea, previous works successfully accelerate the dense matrix based operations (e.g., convolution and linear) with negligible accuracy drop. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sub-sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. Specifically, for the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations. In practice, rsc can achieve up to $11.6\times$ speedup for a single sparse operation and a $1.6\times$ end-to-end wall-clock time speedup with negligible accuracy drop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题