将BERT的知识提炼成序列到序列ASR的知识

论文标题

将BERT的知识提炼成序列到序列ASR的知识

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

论文作者

Futami, Hayato, Inaguma, Hirofumi, Ueno, Sei, Mimura, Masato, Sakai, Shinsuke, Kawahara, Tatsuya

论文摘要

基于注意力的序列到序列（SEQ2SEQ）模型已在自动语音识别（ASR）方面取得了有希望的结果。但是，由于这些模型以从左到右的方式进行解码，因此它们无法访问右侧的上下文。我们通过将BERT作为外部语言模型应用于SEQ2SEQ ASR，通过知识蒸馏来利用左派和右下方。在我们提出的方法中，BERT生成软标签，以指导Seq2Seq ASR的训练。此外，我们利用当前话语作为伯特的输入以外的背景。实验评估表明，我们的方法显着改善了自发日语语料库（CSJ）的SEQ2SEQ基线的ASR性能。从伯特（Bert）发出的知识蒸馏胜于只有左上文上下文的变压器LM的知识。我们还显示了超出当前话语以外的利用背景的有效性。我们的方法表现优于其他LM应用方法，例如n-t-t-t-tesk恢复和浅融合，虽然不需要额外的推理成本。

Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We leverage both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distillation. In our proposed method, BERT generates soft labels to guide the training of seq2seq ASR. Furthermore, we leverage context beyond the current utterance as input to BERT. Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ). Knowledge distillation from BERT outperforms that from a transformer LM that only looks at left context. We also show the effectiveness of leveraging context beyond the current utterance. Our method outperforms other LM application approaches such as n-best rescoring and shallow fusion, while it does not require extra inference cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题