论文标题

对抗性多任务深度学习,用于噪音弹药活动的算法延迟低

Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay

论文作者

Larsen, Claus Meyer, Koch, Peter, Tan, Zheng-Hua

论文摘要

语音活动检测(VAD)是多种语音处理系统中重要的预处理步骤。 VAD在实际应用中应该能够在嘈杂和无噪声环境中检测语音,同时又不引入明显的延迟。在这项工作中,我们建议在训练监督VAD时使用对抗性多任务学习方法。该方法已应用于最新的基于VAD波形的语音活动检测。另外,在不同算法延迟下研究的VADIS的性能,这是潜伏期的重要因素。观察到对模型的对抗性多任务学习引入曲线(AUC)(尤其是在嘈杂的环境中)的性能,而在较高的SNR级别下的性能并未降级。对抗性多任务学习仅在训练阶段应用,因此在测试中没有额外的成本。此外,研究了性能与算法延迟之间的相关性,并且观察到,在将算法延迟从398 ms降低到23毫秒时,VAD性能降解仅在中等。

Voice Activity Detection (VAD) is an important pre-processing step in a wide variety of speech processing systems. VAD should in a practical application be able to detect speech in both noisy and noise-free environments, while not introducing significant latency. In this work we propose using an adversarial multi-task learning method when training a supervised VAD. The method has been applied to the state-of-the-art VAD Waveform-based Voice Activity Detection. Additionally the performance of the VADis investigated under different algorithmic delays, which is an important factor in latency. Introducing adversarial multi-task learning to the model is observed to increase performance in terms of Area Under Curve (AUC), particularly in noisy environments, while the performance is not degraded at higher SNR levels. The adversarial multi-task learning is only applied in the training phase and thus introduces no additional cost in testing. Furthermore the correlation between performance and algorithmic delays is investigated, and it is observed that the VAD performance degradation is only moderate when lowering the algorithmic delay from 398 ms to 23 ms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源