通过对幼儿园时代的语音识别的任务增强进行更好的元启动

论文标题

通过对幼儿园时代的语音识别的任务增强进行更好的元启动

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition

论文作者

Zhu, Yunzheng, Fan, Ruchao, Alwan, Abeer

论文摘要

儿童自动语音识别（ASR）总是很困难，部分原因是数据稀缺问题，尤其是对于幼儿园时代的孩子而言。当数据稀缺时，该模型可能会超越培训数据，因此训练的良好起点至关重要。最近，提出了元学习来学习不同语言的ASR任务的模型初始化（MI）。当模型适应一种看不见的语言时，此方法会导致良好的性能。但是，MI很容易受到过度适应培训任务（学习者过拟合）。 MI是否将其推广到其他低资源任务也未知。在本文中，我们验证了MI在儿童ASR中的有效性，并试图减轻学习者过度拟合的问题。为了实现模型不足的元学习（MAML），我们将每个年龄的儿童的言语视为一项不同的任务。就学习者过度拟合而言，我们通过使用频率翘曲技术模拟新年龄段提出了一种任务级增强方法。进行了详细的实验，以显示任务增强对幼儿园时代语音的每个年龄的影响。结果，我们的方法实现了相对的单词错误率（WER）在基线系统中的51％提高，而没有增强或初始化。

Children's automatic speech recognition (ASR) is always difficult due to, in part, the data scarcity problem, especially for kindergarten-aged kids. When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential. Recently, meta-learning was proposed to learn model initialization (MI) for ASR tasks of different languages. This method leads to good performance when the model is adapted to an unseen language. However, MI is vulnerable to overfitting on training tasks (learner overfitting). It is also unknown whether MI generalizes to other low-resource tasks. In this paper, we validate the effectiveness of MI in children's ASR and attempt to alleviate the problem of learner overfitting. To achieve model-agnostic meta-learning (MAML), we regard children's speech at each age as a different task. In terms of learner overfitting, we propose a task-level augmentation method by simulating new ages using frequency warping techniques. Detailed experiments are conducted to show the impact of task augmentation on each age for kindergarten-aged speech. As a result, our approach achieves a relative word error rate (WER) improvement of 51% over the baseline system with no augmentation or initialization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题