感知器+：一个阶段和SNR意识到实时语音增强的感知器

论文标题

感知器+：一个阶段和SNR意识到实时语音增强的感知器

PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

论文作者

Ge, Xiaofeng, Han, Jiangyu, Long, Yanhua, Guan, Haixin

论文摘要

Percepnet是RNNoise的最新扩展，这是一种有效，高质量和实时的全乐队语音增强技术，在各种公共深层抑制任务中表现出了有希望的表现。本文提出了一种名为percepnet+的新方法，以进一步扩展了感知器，并有四个重大改进。首先，我们通过分别添加复杂的特征和复杂的子带作为深网输入和输出来介绍一个相动的结构，以将相位信息利用为感知器。然后，特殊设计的信噪比（SNR）估计器和SNR切换后处理是为了减轻原始感知器的高SNR条件下出现的过度衰减（OA）。此外，GRU层被TF-GRU取代，以建模时间和频率依赖性。最后，我们建议以多目标学习方式整合复杂子带增益，SNR，音高过滤强度和OA损失的损失，以进一步改善语音增强性能。实验结果表明，所提出的感知+在PESQ和Stoi方面显着优于原始感知器，而不会增加模型大小。

PercepNet, a recent extension of the RNNoise, an efficient, high-quality and real-time full-band speech enhancement technique, has shown promising performance in various public deep noise suppression tasks. This paper proposes a new approach, named PercepNet+, to further extend the PercepNet with four significant improvements. First, we introduce a phase-aware structure to leverage the phase information into PercepNet, by adding the complex features and complex subband gains as the deep network input and output respectively. Then, a signal-to-noise ratio (SNR) estimator and an SNR switched post-processing are specially designed to alleviate the over attenuation (OA) that appears in high SNR conditions of the original PercepNet. Moreover, the GRU layer is replaced by TF-GRU to model both temporal and frequency dependencies. Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance. Experimental results show that, the proposed PercepNet+ outperforms the original PercepNet significantly in terms of both PESQ and STOI, without increasing the model size too much.

下载PDF全文

下载文献需遵守相关版权规定

论文标题