频率门控：改进的卷积神经网络，以增强时间频域的语音

论文标题

频率门控：改进的卷积神经网络，以增强时间频域的语音

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

论文作者

Oostermeijer, Koen, Wang, Qing, Du, Jun

论文摘要

传统卷积神经网络（CNN）的优势之一是它们固有的转化不变性。但是，对于时间频域中语音增强的任务，由于频率方向缺乏不变性，因此无法完全利用此属性。在本文中，我们建议通过引入一种称为频率门控的方法来纠正此效率，以计算CNN内核的乘法权重，以使其依赖于频率。探索了几种机制：时间门控，其中权重取决于先前的时间范围，本地门控，其权重是基于单个时间框架和与之相邻的时间生成的，以及频率的门控，每个内核分配了一个与输入数据无关的权重。使用SKIP连接的自动编码器神经网络进行的实验表明，局部和频率的门控的表现都优于基线，因此是改善基于CNN的语音增强神经网络的可行方法。此外，引入了基于延长的短时客观可理解性评分（ESTOI）的损失函数，我们显示的表现要优于标准平均误差（MSE）损耗函数。

One of the strengths of traditional convolutional neural networks (CNNs) is their inherent translational invariance. However, for the task of speech enhancement in the time-frequency domain, this property cannot be fully exploited due to a lack of invariance in the frequency direction. In this paper we propose to remedy this inefficiency by introducing a method, which we call Frequency Gating, to compute multiplicative weights for the kernels of the CNN in order to make them frequency dependent. Several mechanisms are explored: temporal gating, in which weights are dependent on prior time frames, local gating, whose weights are generated based on a single time frame and the ones adjacent to it, and frequency-wise gating, where each kernel is assigned a weight independent of the input data. Experiments with an autoencoder neural network with skip connections show that both local and frequency-wise gating outperform the baseline and are therefore viable ways to improve CNN-based speech enhancement neural networks. In addition, a loss function based on the extended short-time objective intelligibility score (ESTOI) is introduced, which we show to outperform the standard mean squared error (MSE) loss function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题