Power-bert：通过渐进式矢量消除加速BERT推断

论文标题

Power-bert：通过渐进式矢量消除加速BERT推断

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

论文作者

Goyal, Saurabh, Choudhury, Anamitra R., Raje, Saurabh M., Chakaravarthy, Venkatesan T., Sabharwal, Yogish, Verma, Ashish

论文摘要

我们开发了一种称为Power-Bert的新方法，用于改善流行的BERT模型的推理时间，同时保持准确性。它可以通过：a）利用与单词向量有关的冗余（中间编码器输出）并消除冗余向量。 b）根据自我发挥机制来确定哪些单词向量来消除其意义的策略。 c）通过增强BERT模型和损失函数来学习有多少个单词向量来消除。标准胶基准测试的实验表明，Power-Bert的推理时间降低了4.5倍，精度损失<1％。我们表明，与先前的方法相比，Power-Bert在准确性和推理时间之间提供了更高的权衡。我们证明，我们的方法的推理时间最高为6.8倍，而在Albert上应用的BERT高度压缩版本时，准确性损失<1％。 Power-Bert代码可在https://github.com/ibm/power-bert上公开获得。

We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题