通过细粒度的语音特征改善语音增强

论文标题

通过细粒度的语音特征改善语音增强

Improving Speech Enhancement through Fine-Grained Speech Characteristics

论文作者

Yang, Muqiao, Konan, Joseph, Bick, David, Kumar, Anurag, Watanabe, Shinji, Raj, Bhiksha

论文摘要

尽管基于深度学习的语音增强系统在提高语音信号质量方面取得了迅速的进步，但它们仍然可以产生包含伪影且听起来不自然的输出。我们提出了一种新颖的语音增强方法，旨在通过优化言语的关键特征来提高感知质量和增强信号的自然性。我们首先确定了与语音质量良好相关的关键声学参数（例如抖动，微光和光谱通量），然后提出目标函数，旨在减少相对于这些特征的清洁语音和增强语音之间的差异。完整的声学特征是扩展的Geneva声学参数集（EGEMAPS），其中包括与语音感知相关的25种不同属性。考虑到这些功能计算的非差异性质，我们首先构建了EGEMAP的可区分估计器，然后使用它们来微调现有的语音增强系统。我们的方法是通用的，可以应用于任何现有的基于深度学习的增强系统，以进一步改善增强的语音信号。对深噪声抑制（DNS）挑战数据集进行的实验结果表明，我们的方法可以改善最新的基于深度学习的增强系统。

While deep learning based speech enhancement systems have made rapid progress in improving the quality of speech signals, they can still produce outputs that contain artifacts and can sound unnatural. We propose a novel approach to speech enhancement aimed at improving perceptual quality and naturalness of enhanced signals by optimizing for key characteristics of speech. We first identify key acoustic parameters that have been found to correlate well with voice quality (e.g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features. The full set of acoustic features is the extended Geneva Acoustic Parameter Set (eGeMAPS), which includes 25 different attributes associated with perception of speech. Given the non-differentiable nature of these feature computation, we first build differentiable estimators of the eGeMAPS and then use them to fine-tune existing speech enhancement systems. Our approach is generic and can be applied to any existing deep learning based enhancement systems to further improve the enhanced speech signals. Experimental results conducted on the Deep Noise Suppression (DNS) Challenge dataset shows that our approach can improve the state-of-the-art deep learning based enhancement systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题