论文标题

使用Shift Edovariant神经网络扩展GCC-PHAT

Extending GCC-PHAT using Shift Equivariant Neural Networks

论文作者

Berg, Axel, O'Connor, Mark, Åström, Kalle, Oskarsson, Magnus

论文摘要

使用麦克风阵列的扬声器定位取决于准确的时间延迟估计技术。几十年来,基于与相变(GCC-PHAT)的广义跨相关性的方法已被广泛用于此目的。最近,GCC-PHAT还用于为神经网络提供输入特征,以消除噪声和混响的影响,但以无噪声条件下的理论保证为代价。我们提出了一种新的方法来扩展GCC-PHAT,其中使用移位的神经网络过滤接收的信号,该神经网络保留了信号中包含的时序信息。通过广泛的实验,我们表明,我们的模型始终减少不利环境中GCC-PHAT的误差,并保证在理想条件下确切的时间延迟恢复。

Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源