用嘈杂的伪标记样本来提高主动学习以进行语音识别

论文标题

用嘈杂的伪标记样本来提高主动学习以进行语音识别

Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples

论文作者

Bang, Jihwan, Kim, Heesu, Yoo, YoungJoon, Ha, Jung-Woo

论文摘要

大型语音语料库的注释转录的成本成为一种瓶颈，以最大程度地享受深度神经网络的自动语音识别模型的潜在能力。在本文中，我们提出了一条新的培训管道，以促进针对标签有效学习以解决上述问题的常规主动学习方法。现有的主动学习方法仅着眼于在标签预算下选择一组信息的样本。更进一步，我们建议可以通过利用未标记的样本（超过标签预算）来进一步提高培训效率，通过引入精致的无监督损失互补的有效互补的监督损失。我们提出了基于一致性正规化的新的无监督损失，并且我们配置了适当的增强技术，以便在自动语音识别任务中采用一致性正则化。从现实世界数据集的定性和定量实验以及在实地场景下，我们表明拟议的培训管道可以提高主动学习方法的功效，从而成功降低可持续数量的人类标记成本。

The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training pipeline boosting the conventional active learning approach targeting label-efficient learning to resolve the mentioned problem. Existing active learning methods only focus on selecting a set of informative samples under a labeling budget. One step further, we suggest that the training efficiency can be further improved by utilizing the unlabeled samples, exceeding the labeling budget, by introducing sophisticatedly configured unsupervised loss complementing supervised loss effectively. We propose new unsupervised loss based on consistency regularization, and we configure appropriate augmentation techniques for utterances to adopt consistency regularization in the automatic speech recognition task. From the qualitative and quantitative experiments on the real-world dataset and under real-usage scenarios, we show that the proposed training pipeline can boost the efficacy of active learning approaches, thus successfully reducing a sustainable amount of human labeling cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题