论文标题
在噪音环境中,数据增强与X-矢量扬声器识别系统的降温补偿
Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments
论文作者
论文摘要
基于深神经网络(DNN)的可用语音数据和新说话者建模方法的爆炸使得能够开发更健壮的说话者识别系统。在DNN扬声器建模技术中,X矢量系统在嘈杂的环境中显示出一定程度的鲁棒性。先前的研究表明,通过增加培训数据中的说话者的数量并使用数据增强,可以在嘈杂的环境中实现更健壮的说话者识别系统。在这项工作中,我们想知道,尽管这些系统的一般噪声稳健性,但显式噪声补偿技术是否仍继续有效。在这项研究中,我们将使用两个不同的X-Vector网络:第一个网络对Voxceleb1(协议1)进行培训,第二个是在Voxceleb1+VoxveveleB2(协议2)上训练的。我们建议在评分之前添加一个deno的X矢量子系统。实验结果表明,协议2中使用的X矢量系统比另一个使用的协议1更强大。尽管有这一观察结果,我们将表明,显式噪声补偿给两个方案中的EER相对增益几乎相同。例如,在协议2中,我们通过denoising技术的EER提高了21%至66%。
The explosion of available speech data and new speaker modeling methods based on deep neural networks (DNN) have given the ability to develop more robust speaker recognition systems. Among DNN speaker modelling techniques, x-vector system has shown a degree of robustness in noisy environments. Previous studies suggest that by increasing the number of speakers in the training data and using data augmentation more robust speaker recognition systems are achievable in noisy environments. In this work, we want to know if explicit noise compensation techniques continue to be effective despite the general noise robustness of these systems. For this study, we will use two different x-vector networks: the first one is trained on Voxceleb1 (Protocol1), and the second one is trained on Voxceleb1+Voxveleb2 (Protocol2). We propose to add a denoising x-vector subsystem before scoring. Experimental results show that, the x-vector system used in Protocol2 is more robust than the other one used Protocol1. Despite this observation we will show that explicit noise compensation gives almost the same EER relative gain in both protocols. For example, in the Protocol2 we have 21% to 66% improvement of EER with denoising techniques.