跨模式ASR ASR后处理系统，用于纠正和拒绝话语

论文标题

跨模式ASR ASR后处理系统，用于纠正和拒绝话语

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

论文作者

Du, Jing, Pu, Shiliang, Dong, Qinbo, Jin, Chao, Qi, Xin, Gu, Dian, Wu, Ru, Zhou, Hongwei

论文摘要

尽管现代的自动语音识别（ASR）系统可以实现高性能，但它们可能会产生削弱读者经验并损害下游任务的错误。为了提高ASR假设的准确性和可靠性，我们为语音识别器提出了一个跨模式后处理系统，其中1）融合了来自不同模式的声学特征和文本特征，2）关节置信度估计器和多任务学习方式的错误纠正措施和多任务学习方式和3）统一误差纠正和说服抑制模式。与单模式或单任务模型相比，我们提出的系统被证明更加有效。实验结果表明，我们的后处理系统导致单扬声器和多演讲者在我们的工业ASR系统上的角色错误率（CER）的相对相对降低超过10％以上，每个令牌的延迟约为1.7ms，这确保了在流媒体语音识别中可以接受后期处理的额外延迟。

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we propose a cross-modal post-processing system for speech recognizers, which 1) fuses acoustic features and textual features from different modalities, 2) joints a confidence estimator and an error corrector in multi-task learning fashion and 3) unifies error correction and utterance rejection modules. Compared with single-modal or single-task models, our proposed system is proved to be more effective and efficient. Experiment result shows that our post-processing system leads to more than 10% relative reduction of character error rate (CER) for both single-speaker and multi-speaker speech on our industrial ASR system, with about 1.7ms latency for each token, which ensures that extra latency introduced by post-processing is acceptable in streaming speech recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题