论文标题

Power-bert:通过渐进式矢量消除加速BERT推断

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

论文作者

Goyal, Saurabh, Choudhury, Anamitra R., Raje, Saurabh M., Chakaravarthy, Venkatesan T., Sabharwal, Yogish, Verma, Ashish

论文摘要

我们开发了一种称为Power-Bert的新方法,用于改善流行的BERT模型的推理时间,同时保持准确性。它可以通过:a)利用与单词向量有关的冗余(中间编码器输出)并消除冗余向量。 b)根据自我发挥机制来确定哪些单词向量来消除其意义的策略。 c)通过增强BERT模型和损失函数来学习有多少个单词向量来消除。标准胶基准测试的实验表明,Power-Bert的推理时间降低了4.5倍,精度损失<1%。我们表明,与先前的方法相比,Power-Bert在准确性和推理时间之间提供了更高的权衡。我们证明,我们的方法的推理时间最高为6.8倍,而在Albert上应用的BERT高度压缩版本时,准确性损失<1%。 Power-Bert代码可在https://github.com/ibm/power-bert上公开获得。

We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源