使用Cepstral＆Bisectral统计数据检测AI合成语音

论文标题

使用Cepstral＆Bisectral统计数据检测AI合成语音

Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

论文作者

Singh, Arun Kumar, Singh, Priyanka

论文摘要

数字技术使无法想象的应用程序实现了。拥有少数工具来易于编辑和操作，这似乎令人兴奋，但它引起了令人震惊的担忧，可以作为语音克隆，重复或可能是深层假货传播。验证语音的真实性是数字音频取证的主要问题之一。我们提出了一种方法，将人类言语与AI合成的语音区分开来，利用双光谱和sepstral分析。与综合语音相比，高阶统计数据与人类语音的相关性较小。同样，cepstral分析揭示了人类语音中持久的功率组成部分，而综合语音缺少。我们整合了这两个分析，并提出了一个机器学习模型来检测AI综合语音。

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题