论文标题
训练嘈杂的单渠道语音分离与嘈杂的甲骨文来源:一个很大的差距和小步骤
Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step
论文作者
论文摘要
随着单渠道语音分离系统的性能有所改善,与开发初始系统的清洁,近场语音相比,人们一直希望转向更具挑战性的条件。当训练深度学习分离模型时,需要地面真理会导致对合成混合物的培训。因此,在嘈杂条件下的训练需要合成噪声添加到干净的语音中,以防止使用内域数据进行嘈杂的条件任务,或者使用嘈杂的语音混合物进行训练,以便该网络额外将噪声分开。我们证明了噪声的相对不可分割性,并且这种嘈杂的语音范式导致系统性能的显着降解。我们还提出了一个受SI-SDR启发的训练目标,该目标试图利用噪声的不可分割性,以隐式将信号和折扣噪声分离误差分开,从而可以使用嘈杂的甲骨文来源对更好的分离系统进行训练。
As the performance of single-channel speech separation systems has improved, there has been a desire to move to more challenging conditions than the clean, near-field speech that initial systems were developed on. When training deep learning separation models, a need for ground truth leads to training on synthetic mixtures. As such, training in noisy conditions requires either using noise synthetically added to clean speech, preventing the use of in-domain data for a noisy-condition task, or training using mixtures of noisy speech, requiring the network to additionally separate the noise. We demonstrate the relative inseparability of noise and that this noisy speech paradigm leads to significant degradation of system performance. We also propose an SI-SDR-inspired training objective that tries to exploit the inseparability of noise to implicitly partition the signal and discount noise separation errors, enabling the training of better separation systems with noisy oracle sources.