FPRAKER：加速神经网络培训的处理元素

论文标题

FPRAKER：加速神经网络培训的处理元素

FPRaker: A Processing Element For Accelerating Neural Network Training

论文作者

Awad, Omar Mohamed, Mahmoud, Mostafa, Edo, Isak, Zadeh, Ali Hadi, Bannon, Ciaran, Jayarajan, Anand, Pekhimenko, Gennady, Moshovos, Andreas

论文摘要

我们提出了Fpraker，这是用于组成培训加速器的处理元素。 Fpraker同时处理了几个浮点多重蓄能操作，并将其结果累积到更高的精度蓄能器中。 Fpraker通过利用训练过程中自然出现的值来提高训练期间的性能和能源效率。具体而言，它处理每个多重蓄能的操作数的重要性，作为一系列签名的功率。转换为此表格是在即时完成的。这揭示了可以跳过的无效工作：编码时的值很少，其中一些可以被丢弃，因为它们将落在蓄能器的范围之外，因为浮动点的精度有限。我们证明，与在ISO计算区域约束下使用常规的浮点单元相比，FPRAKER可用于构成训练加速器进行训练，并且可以提高性能和能源效率。我们还证明，Fpraker在培训结合修剪和量化时会带来其他好处。最后，我们表明Fpraker自然会通过使用不同精度每一层精度的训练方法放大性能。

We present FPRaker, a processing element for composing training accelerators. FPRaker processes several floating-point multiply-accumulation operations concurrently and accumulates their result into a higher precision accumulator. FPRaker boosts performance and energy efficiency during training by taking advantage of the values that naturally appear during training. Specifically, it processes the significand of the operands of each multiply-accumulate as a series of signed powers of two. The conversion to this form is done on-the-fly. This exposes ineffectual work that can be skipped: values when encoded have few terms and some of them can be discarded as they would fall outside the range of the accumulator given the limited precision of floating-point. We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints. We also demonstrate that FPRaker delivers additional benefits when training incorporates pruning and quantization. Finally, we show that FPRaker naturally amplifies performance with training methods that use a different precision per layer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题