使用自动适应和多头自我注意力提高语音

论文标题

使用自动适应和多头自我注意力提高语音

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

论文作者

Koizumi, Yuma, Yatabe, Kohei, Delcroix, Marc, Masuyama, Yoshiki, Takeuchi, Daiki

论文摘要

本文研究了一种使用辅助扬声器感知的特征来增强语音的自我适应方法。我们从测试说法中提取用于适应的扬声器表示。深度神经网络（DNN）的常规研究 - 基于语音的增强主要集中于构建说话者独立模型。同时，在包括语音识别和综合在内的语音应用中，众所周知，对目标扬声器的模型适应可以提高准确性。我们的研究问题是，是否可以在没有测试相的任何辅助引导信号的情况下将用于语音增强的DNN用于言语增强。为了实现这一目标，我们采用了语音增强和扬声器识别的多任务学习，并将最终的扬声器识别分支隐藏层的输出作为辅助功能。此外，我们使用多头自我注意力来捕获语音和噪声中的长期依赖性。公共数据集上的实验结果表明，我们的策略在主观质量方面实现了最先进的性能，并且表现出色。

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题