论文标题
智能语音细分,使用声学特征与lookeaghead
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
论文作者
论文摘要
连续自动语音识别(ASR)的分割传统上使用了沉默的超时或语音活动探测器(VADS),这两者都限于声学特征。鉴于人们在说话时自然要停下来思考,这种细分通常是过分积极的。因此,细分发生在中间句子中,阻碍了标点符号和下游任务,例如机器翻译,高质量分割至关重要。利用声学特征的基于模型的分割方法具有强大的功能,但是在不了解语言本身的情况下,这些方法是有限的。我们提出了一种混合方法,该方法利用声学和语言信息来改善细分。此外,我们表明,包括一个单词是一个看起来像是提高细分质量的质量。平均而言,我们的模型比基线提高了分段-F0.5的得分9.8%。我们证明这种方法适用于多种语言。对于机器翻译的下游任务,它平均将翻译BLEU得分提高了1.05点。
Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points.