通过重叠的预测改善框架在线神经语音增强

论文标题

通过重叠的预测改善框架在线神经语音增强

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

论文作者

Wang, Zhong-Qiu, Watanabe, Shinji

论文摘要

短期傅立叶变换（STFT）域中的框架在线语音增强系统通常具有算法延迟等于窗口大小，因此由于在逆STFT（ISTFT）中使用了重叠ADD。这种算法延迟允许增强模型利用将来的上下文信息达到等于窗口大小的长度。但是，此信息仅由当前的框架在线系统部分利用。为了充分利用它，我们为基于深度学习的在线语音增强而提出了一种重叠的框架预测技术，在每个帧中，我们的深神经网络（DNN）都可以预测重叠ADD所需的当前和几个框架，而不是仅预测当前帧。此外，我们提出了一个损失函数，以说明预测目标信号和甲骨文目标信号之间的比例差异。对嘈杂的语音增强任务的实验显示了所提出的算法的有效性。

Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of overlap-add in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, this information is only partially leveraged by current frame-online systems. To fully exploit it, we propose an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a loss function to account for the scale difference between predicted and oracle target signals. Experiments on a noisy-reverberant speech enhancement task show the effectiveness of the proposed algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题