在真实音频录音中检测合成语音操纵

论文标题

在真实音频录音中检测合成语音操纵

Detecting Synthetic Speech Manipulation in Real Audio Recordings

论文作者

Rahman, Md Hafizur, Graciarena, Martin, Castan, Diego, Cobo-Kroenke, Chris, McLaren, Mitchell, Lawson, Aaron

论文摘要

人工言语和音频技术的最新进展提高了深料运营商伪造媒体和传播恶意错误信息的能力。任何具有有限编码技能的人都可以使用可自由使用的语音合成工具来创建有影响力的说话者声音的令人信服的模拟，以扭曲原始信息的恶意意图。有了最新的技术，恶意操作员不必生成整个音频剪辑。取而代之的是，他们可以将部分操纵或合成语音段插入真正的音频记录中，以更改原始消息的整个上下文和含义。检测这些插入尤其具有挑战性，因为部分操纵的音频比完全假消息更容易避免合成语音探测器。本文介绍了基于X resnet架构的潜在部分合成语音检测系统，其概率线性判别分析（PLDA）后端和交错意识分数处理。实验结果表明，PLDA后端导致非PLDA基线的部分合成数据集的平均误差降低25％。

Recent advances in artificial speech and audio technologies have improved the abilities of deep-fake operators to falsify media and spread malicious misinformation. Anyone with limited coding skills can use freely available speech synthesis tools to create convincing simulations of influential speakers' voices with the malicious intent to distort the original message. With the latest technology, malicious operators do not have to generate an entire audio clip; instead, they can insert a partial manipulation or a segment of synthetic speech into a genuine audio recording to change the entire context and meaning of the original message. Detecting these insertions is especially challenging because partially manipulated audio can more easily avoid synthetic speech detectors than entirely fake messages can. This paper describes a potential partial synthetic speech detection system based on the x-ResNet architecture with a probabilistic linear discriminant analysis (PLDA) backend and interleaved aware score processing. Experimental results suggest that the PLDA backend results in a 25% average error reduction among partially synthesized datasets over a non-PLDA baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题