对语音增强目标特征的感知对比度伸展

论文标题

对语音增强目标特征的感知对比度伸展

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

论文作者

Chao, Rong, Yu, Cheng, Fu, Szu-Wei, Lu, Xugang, Tsao, Yu

论文摘要

由于使用深度学习模型作为基本功能，语音增强（SE）的性能已大大提高。在此，我们提出了一种感知对比度拉伸（PC）方法，以进一步提高SE性能。 PC是基于临界频带重要性函数得出的，并应用于修改SE模型的目标。具体而言，目标特征的对比是根据感知重要性拉伸的，从而提高了整体SE性能。与基于后处理的实现相比，将PC纳入培训阶段可以保留性能并减少在线计算。值得注意的是，PC可以与不同的SE模型架构和训练标准结合使用。此外，PC不影响SE模型训练的因果关系或收敛性。 VoiceBank按需数据集的实验结果表明，所提出的方法可以在因果关系（PESQ得分= 3.07）和非causal（PESQ分数= 3.35）SE任务上实现最先进的表现。

Speech enhancement (SE) performance has improved considerably owing to the use of deep learning models as a base function. Herein, we propose a perceptual contrast stretching (PCS) approach to further improve SE performance. The PCS is derived based on the critical band importance function and is applied to modify the targets of the SE model. Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance. Compared with post-processing-based implementations, incorporating PCS into the training phase preserves performance and reduces online computation. Notably, PCS can be combined with different SE model architectures and training criteria. Furthermore, PCS does not affect the causality or convergence of SE model training. Experimental results on the VoiceBank-DEMAND dataset show that the proposed method can achieve state-of-the-art performance on both causal (PESQ score = 3.07) and noncausal (PESQ score = 3.35) SE tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题