论文标题
稳健的光束搜索基于注意的语音识别而没有长度偏见
Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias
论文作者
论文摘要
作为端到端语音识别的一种流行的建模方法,已知基于注意力的编码器模型会遭受长度偏差和相应的光束问题。在简单的梁搜索中采用了不同的方法来缓解问题,其中大多数是基于启发式的,需要大量调整。我们表明,启发式方法不是适当的建模改进,这会导致严重的性能降解,而梁尺寸在很大程度上增加了。我们提出了一个新型的光束搜索,该搜索是通过重新解释后序的明确长度建模而得出的。通过将重新解释的概率与光束修剪一起应用,获得的最终概率会导致健壮的模型修改,从而可以在不同长度的输出序列之间进行可靠的比较。在Librispeech语料库上进行的实验验证表明,所提出的方法在没有启发式或额外调整工作的情况下解决了长度偏差问题。它提供了健壮的决策,并且在小型和非常大的光束尺寸下始终如一地表现出色。与启发式基线的最佳结果相比,所提出的方法在“清洁”集中获得了相同的效果,而“其他”集合的相对改善了4%。我们还表明,随着额外的提前停止标准,它更有效。
As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in simple beam search to ease the problem, most of which are heuristic-based and require considerable tuning. We show that heuristics are not proper modeling refinement, which results in severe performance degradation with largely increased beam sizes. We propose a novel beam search derived from reinterpreting the sequence posterior with an explicit length modeling. By applying the reinterpreted probability together with beam pruning, the obtained final probability leads to a robust model modification, which allows reliable comparison among output sequences of different lengths. Experimental verification on the LibriSpeech corpus shows that the proposed approach solves the length bias problem without heuristics or additional tuning effort. It provides robust decision making and consistently good performance under both small and very large beam sizes. Compared with the best results of the heuristic baseline, the proposed approach achieves the same WER on the 'clean' sets and 4% relative improvement on the 'other' sets. We also show that it is more efficient with the additional derived early stopping criterion.