SGD_Tucker：一种新型的随机优化策略，用于平行稀疏塔克分解

论文标题

SGD_Tucker：一种新型的随机优化策略，用于平行稀疏塔克分解

SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

论文作者

Li, Hao, Li, Zixuan, Li, Kenli, Rellermeyer, Jan S., Chen, Lydia Y., Li, Keqin

论文摘要

稀疏的塔克分解（STD）算法学习核心张量和一组因子矩阵，以获得\ usew-下列{ （Hohdst）。但是，现有的STD算法面临中间变量爆炸的问题，这是由于这些变量的形成，即矩阵Khatri-rao产品，Kronecker产品和矩阵矩阵乘积，遵循稀疏张力张力的整个元素。上述问题可阻止有效计算和大数据平台的深层融合。为了克服瓶颈，提出了一种新型的随机优化策略（SGD $ \ _ $ Tucker），为STD提出了可以自动将高维中间变量分为中间矩阵的小批次。具体来说，SGD $ \ _ $ Tucker仅遵循随机选择的小样本而不是整个元素，同时保持整体准确性和收敛速度。实际上，SGD $ \ _ $ tucker具有与最新状态相比的两个不同进步。首先，SGD $ \ _ $ tucker可以在分布式设置中修剪核心张量的通信开销。其次，SGD $ \ _ $ tucker的低数据依赖性使得精细的并行化，这使得SGD $ \ _ $ Tucker以相同的精度获得了较低的计算开销。实验结果表明，SGD $ \ _ $ Tucker的运行速度至少比最新状态更快。

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the \underline{H}igh-\underline{O}rder, \underline{H}igh-\underline{D}imension, and \underline{S}parse \underline{T}ensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD$\_$Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD$\_$Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD$\_$Tucker features the two distinct advancements over the state of the art. First, SGD$\_$Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD$\_$Tucker enables fine-grained parallelization, which makes SGD$\_$Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD$\_$Tucker runs at least 2$X$ faster than the state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题