有效地集成多通道信息，以用于说话者独立的语音分离

论文标题

有效地集成多通道信息，以用于说话者独立的语音分离

Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation

论文作者

Koyama, Yuichiro, Azeez, Oluwafemi, Raj, Bhiksha

论文摘要

尽管基于深度学习的方法在过去几年中显着改善了语音分离的性能，但仍然是一个悬而未决的问题，如何整合多通道信号进行语音分离。我们提出了两种方法，即早期融合和晚融合方法，以基于时域音频分离网络整合多通道信息，这些信息已被证明在单渠道语音分离中有效。我们还提出了渠道 - 序列转移学习，这是一个传输学习框架，该框架应用了针对低通道网络训练的参数作为高通道网络的初始值。为了进行公平的比较，我们使用了开源的WSJ0-2MIX数据集的空间化版本评估了我们提出的方法。发现我们提出的方法可以胜过多通道深聚类，并与麦克风数量成比例地提高性能。也证明，无论扬声器之间的角度差如何，晚期融合方法的性能始终高于单渠道方法的性能。

Although deep-learning-based methods have markedly improved the performance of speech separation over the past few years, it remains an open question how to integrate multi-channel signals for speech separation. We propose two methods, namely, early-fusion and late-fusion methods, to integrate multi-channel information based on the time-domain audio separation network, which has been proven effective in single-channel speech separation. We also propose channel-sequential-transfer learning, which is a transfer learning framework that applies the parameters trained for a lower-channel network as the initial values of a higher-channel network. For fair comparison, we evaluated our proposed methods using a spatialized version of the wsj0-2mix dataset, which is open-sourced. It was found that our proposed methods can outperform multi-channel deep clustering and improve the performance proportionally to the number of microphones. It was also proven that the performance of the late-fusion method is consistently higher than that of the single-channel method regardless of the angle difference between speakers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题