通过扬声器验证的DeepFake音频检测

论文标题

通过扬声器验证的DeepFake音频检测

Deepfake audio detection by speaker verification

论文作者

Pianese, Alessandro, Cozzolino, Davide, Poggi, Giovanni, Verdoliva, Luisa

论文摘要

得益于深度学习的最新进展，如今存在复杂的生成工具，这些工具产生了极其现实的综合语音。但是，这种工具的恶意使用是可能的，有可能的，对我们的社会构成了严重威胁。因此，合成语音检测已成为一个紧迫的研究主题，最近提出了各种各样的检测方法。不幸的是，它们几乎没有概括为在训练阶段从未见过的工具产生的合成音频，这使得它们不适合面对现实世界。在这项工作中，我们旨在通过提出一种仅利用说话者的生物特征的新检测方法来克服这个问题，而无需提及特定的操纵。由于仅在实际数据上对检测器进行训练，因此可以自动确保概括。建议的方法可以基于现成的扬声器验证工具来实现。我们在三个流行的测试集上测试了几种这样的解决方案，从而获得了良好的性能，高概括能力和高度鲁棒性。

Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes them unfit to face real-world scenarios. In this work, we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题