使用机器翻译的ASR误差校正和域适应

论文标题

使用机器翻译的ASR误差校正和域适应

ASR Error Correction and Domain Adaptation Using Machine Translation

论文作者

Mani, Anirudh, Palaskar, Shruti, Meripo, Nimshi Venkat, Konam, Sandeep, Metze, Florian

论文摘要

现成的预训练的自动语音识别（ASR）系统对于任何尺寸的基于语音的产品的公司来说都是越来越可行的服务。尽管这些ASR系统受到大量数据的培训，但对于许多想要使用此服务的当事方而言，域不匹配仍然是一个问题，导致其任务并不是那么最佳结果。我们提出了一项简单的技术，以通过机器翻译对ASR误差进行域校正进行域适应。机器翻译模型是在相应的参考文件中学习从室外ASR错误到内域项的映射的有力候选人。在这项工作中，我们使用两个现成的ASR系统：Google ASR（商业）和Aspire模型（开源）。我们通过我们提出的方法观察到单词错误率的绝对提高7％，而Google ASR输出中BLEU得分的绝对提高了4点。我们还通过说话者诊断的下游任务来评估ASR误差校正，该任务捕获了通过ASR校正获得的扬声器样式，语法，结构和语义改进。

Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. The machine translation model is a strong candidate to learn a mapping from out-of-domain ASR errors to in-domain terms in the corresponding reference files. We use two off-the-shelf ASR systems in this work: Google ASR (commercial) and the ASPIRE model (open-source). We observe 7% absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output via our proposed method. We also evaluate ASR error correction via a downstream task of Speaker Diarization that captures speaker style, syntax, structure and semantic improvements we obtain via ASR correction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题