用于语音增强的非作业FFTNET体系结构

论文标题

用于语音增强的非作业FFTNET体系结构

A non-causal FFTNet architecture for speech enhancement

论文作者

Shifas, Muhammed PV, Adiga, Nagaraj, Tsiaras, Vassilis, Stylianou, Yannis

论文摘要

在本文中，我们建议一种基于FFTNet的新的平行，非毒性和浅的波形域架构，以增强语音，这是一种用于生成高质量音频波形的神经网络。与WaveNet等其他基于波形的方法相比，FFTNet使用了初始的宽扩散模式。这种结构更好地代表了时间域中语音的长期相关结构，在时间域，噪声通常是高度无关的，因此它适用于基于波形域的语音增强。为了进一步增强FFTNET的这一特征，我们建议使用非fftnet架构，其中每一层中的当前样品是从上一层的过去和将来估计的。通过暗示浅网络并在一定范围内应用非毒性，建议的fftnet进行语音增强（SE-FFTNET）使用的参数少得多，与其他基于神经网络的方法相比，用于语音增强的方法（如WaveNet和Segan）。具体而言，建议的网络的模型参数大大降低：与segan相比，与沃维特（WaveNet）相比少了32％，少87％。最后，基于主观和客观的指标，Se-Fftnet在信号质量方面优于WaveNet，而它提供的性能与SEGAN同样良好。该体系结构的张量实现为1。

In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time domain, where noise is usually highly non-correlated, and therefore it is suitable for waveform domain based speech enhancement. To further strengthen this feature of FFTNet, we suggest a non-causal FFTNet architecture, where the present sample in each layer is estimated from the past and future samples of the previous layer. By suggesting a shallow network and applying non-causality within certain limits, the suggested FFTNet for speech enhancement (SE-FFTNet) uses much fewer parameters compared to other neural network based approaches for speech enhancement like WaveNet and SEGAN. Specifically, the suggested network has considerably reduced model parameters: 32% fewer compared to WaveNet and 87% fewer compared to SEGAN. Finally, based on subjective and objective metrics, SE-FFTNet outperforms WaveNet in terms of enhanced signal quality, while it provides equally good performance as SEGAN. A Tensorflow implementation of the architecture is provided at 1 .

下载PDF全文

下载文献需遵守相关版权规定

论文标题