论文标题
有效地集成多通道信息,以用于说话者独立的语音分离
Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation
论文作者
论文摘要
尽管基于深度学习的方法在过去几年中显着改善了语音分离的性能,但仍然是一个悬而未决的问题,如何整合多通道信号进行语音分离。我们提出了两种方法,即早期融合和晚融合方法,以基于时域音频分离网络整合多通道信息,这些信息已被证明在单渠道语音分离中有效。我们还提出了渠道 - 序列转移学习,这是一个传输学习框架,该框架应用了针对低通道网络训练的参数作为高通道网络的初始值。为了进行公平的比较,我们使用了开源的WSJ0-2MIX数据集的空间化版本评估了我们提出的方法。发现我们提出的方法可以胜过多通道深聚类,并与麦克风数量成比例地提高性能。也证明,无论扬声器之间的角度差如何,晚期融合方法的性能始终高于单渠道方法的性能。
Although deep-learning-based methods have markedly improved the performance of speech separation over the past few years, it remains an open question how to integrate multi-channel signals for speech separation. We propose two methods, namely, early-fusion and late-fusion methods, to integrate multi-channel information based on the time-domain audio separation network, which has been proven effective in single-channel speech separation. We also propose channel-sequential-transfer learning, which is a transfer learning framework that applies the parameters trained for a lower-channel network as the initial values of a higher-channel network. For fair comparison, we evaluated our proposed methods using a spatialized version of the wsj0-2mix dataset, which is open-sourced. It was found that our proposed methods can outperform multi-channel deep clustering and improve the performance proportionally to the number of microphones. It was also proven that the performance of the late-fusion method is consistently higher than that of the single-channel method regardless of the angle difference between speakers.