论文标题
通过滑动排列不变训练的多种声音来源的位置跟踪
Position tracking of a varying number of sound sources with sliding permutation invariant training
论文作者
论文摘要
最近基于数据和学习的声音源定位(SSL)方法在具有挑战性的声学场景中表现出强烈的性能。但是,几乎没有做任何工作来调整此类方法来跟踪始终如一地出现和消失的多种来源,就像现实中一样。在本文中,我们为深度学习SSL模型提供了一种新的培训策略,该策略具有直接实现的基础,基于上述时间范围中估计位置和参考位置之间的最佳关联的平方误差。它优化了跟踪系统的所需属性:根据其轨迹处理时间变化数量的来源并订购本地化估计,最小化身份开关(IDSS)。对多个回响移动源的模拟数据和两个模型体系结构的评估证明了其在降低身份开关的有效性,而不会损害框架定位精度。
Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep learning SSL models with a straightforward implementation based on the mean squared error of the optimal association between estimated and reference positions in the preceding time frames. It optimizes the desired properties of a tracking system: handling a time-varying number of sources and ordering localization estimates according to their trajectories, minimizing identity switches (IDSs). Evaluation on simulated data of multiple reverberant moving sources and on two model architectures proves its effectiveness on reducing identity switches without compromising frame-wise localization accuracy.