通过语音级别和音素级掩蔽方法改善语音表示学习

论文标题

通过语音级别和音素级掩蔽方法改善语音表示学习

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Zhu, Kexin, Xiao, Jing

论文摘要

恢复蒙面的语音框架被广泛应用于语音表示学习。但是，这些模型中的大多数在预训练中使用随机掩蔽。在这项工作中，我们提出了两种掩盖方法：（1）语音级掩蔽，使模型掩盖了语音段的比沉默段更多，（2）音素级掩盖，迫使模型掩盖了音素的整个框架，而不是音素片段。我们通过这两种方法对模型进行了预先培训，并在两个下游任务（音素分类和说话者识别）上进行了评估。实验表明，所提出的掩蔽方法有益于提高语音表示的性能。

Recovering the masked speech frames is widely applied in speech representation learning. However, most of these models use random masking in the pre-training. In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces. We pre-trained the model via these two approaches, and evaluated on two downstream tasks, phoneme classification and speaker recognition. The experiments demonstrated that the proposed masking approaches are beneficial to improve the performance of speech representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题