论文标题
带有临时麦克风阵列的端到端二维声源定位
End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays
论文作者
论文摘要
常规的声源定位方法主要基于由多个麦克风组成的单个麦克风阵列。它们通常被配制为对到达问题方向的估计。在本文中,我们提出了一种具有临时麦克风阵列的基于深度学习的端到端声源定位方法,其中临时麦克风阵列是一组随机分布的麦克风阵列,相互协作。它可以产生扬声器的二维位置,每个节点只有一个麦克风。具体而言,我们将有针对性的室内空间划分为多个地方区域。我们通过一个旋转代码对每个局部区域进行编码,因此,节点和扬声器位置可以由单热代码表示。因此,鉴于麦克风节点的一个热代码及其语音记录,声音源本地化问题被提出为识别说话者的单热代码的分类任务。端到端时空深层模型是为分类问题而设计的。它利用插入在体系结构中间的融合层的时空注意体系结构,在模型训练和测试过程中,该体系结构能够处理任意数量的麦克风节点。实验结果表明,该方法在高度混响和嘈杂的环境中产生良好的性能。
Conventional sound source localization methods are mostly based on a single microphone array that consists of multiple microphones. They are usually formulated as the estimation of the direction of arrival problem. In this paper, we propose a deep-learning-based end-to-end sound source localization method with ad-hoc microphone arrays, where an ad-hoc microphone array is a set of randomly distributed microphone arrays that collaborate with each other. It can produce two-dimensional locations of speakers with only a single microphone per node. Specifically, we divide a targeted indoor space into multiple local areas. We encode each local area by a one-hot code, therefore, the node and speaker locations can be represented by the one-hot codes. Accordingly, the sound source localization problem is formulated as such a classification task of recognizing the one-hot code of the speaker given the one hot codes of the microphone nodes and their speech recordings. An end-to-end spatial-temporal deep model is designed for the classification problem. It utilizes a spatial-temporal attention architecture with a fusion layer inserted in the middle of the architecture, which is able to handle arbitrarily different numbers of microphone nodes during the model training and test. Experimental results show that the proposed method yields good performance in highly reverberant and noisy environments.