论文标题
使用Cepstral&Bisectral统计数据检测AI合成语音
Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics
论文作者
论文摘要
数字技术使无法想象的应用程序实现了。拥有少数工具来易于编辑和操作,这似乎令人兴奋,但它引起了令人震惊的担忧,可以作为语音克隆,重复或可能是深层假货传播。验证语音的真实性是数字音频取证的主要问题之一。我们提出了一种方法,将人类言语与AI合成的语音区分开来,利用双光谱和sepstral分析。与综合语音相比,高阶统计数据与人类语音的相关性较小。同样,cepstral分析揭示了人类语音中持久的功率组成部分,而综合语音缺少。我们整合了这两个分析,并提出了一个机器学习模型来检测AI综合语音。
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.