论文标题
万花筒:所有结构化线性图的有效,可学习的表示形式
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
论文作者
论文摘要
现代神经网络体系结构使用结构化线性变换,例如低级矩阵,稀疏矩阵,排列和傅立叶变换,以提高推理速度并减少与一般线性图相比的记忆使用速度。但是,选择要使用的众多结构化转换(及其相关的参数化)是一项艰巨的任务,需要兑换速度,空间和准确性。我们考虑了一种不同的方法:我们引入了一个称为万花筒矩阵(K-Matrices)的矩阵家族,该矩阵可证明具有近乎最佳空间(参数)和时间(算术操作)复杂性的任何结构化矩阵。我们从经验上验证了可以在端到端管道中自动学习的K- matrices来替换手工制作的程序,以提高模型质量。例如,替换洗牌片中的通道洗牌可提高成像网上的分类精度高达5%。 K-Matrices还可以简化手工设计的管道 - 我们用可学习的万花筒层替换了语音数据预处理中的过滤器库功能计算,从而在TimIT语音识别任务中仅导致0.4%的准确性损失。此外,K-摩托车可以捕获模型中的潜在结构:对于具有挑战性的置换图像分类任务,基于K-Matrix的置换表示可以学习正确的潜在结构,并将下游卷积模型的准确性提高了9%以上。我们实际上提供了对方法的有效实施,并在变压器网络中使用K-矩阵来在语言翻译任务上更快地达到36%的端到端推理速度。
Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. K-matrices can also simplify hand-engineered pipelines -- we replace filter bank feature computation in speech data preprocessing with a learnable kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. In addition, K-matrices can capture latent structure in models: for a challenging permuted image classification task, a K-matrix based representation of permutations is able to learn the right latent structure and improves accuracy of a downstream convolutional model by over 9%. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.