论文标题
有一种以上的鲁棒性:用对抗性的例子欺骗小声
There is more than one kind of robustness: Fooling Whisper with adversarial examples
论文作者
论文摘要
Whisper是最近的自动语音识别(ASR)模型,显示出令人印象深刻的鲁棒性,对分布式输入和随机噪声表现出令人印象深刻的稳健性。在这项工作中,我们表明这种鲁棒性并不能延续到对抗性的噪音。我们表明,通过产生非常小的输入扰动,信号噪声比为35-45db,我们可以大大降低耳语性能,甚至可以转录我们选择的目标句子。我们还表明,通过愚弄耳语语言探测器,我们可以很容易地降低多语言模型的性能。广受欢迎的开源模型的这些漏洞具有实际的安全性含义,并强调了对对抗性强大的ASR的需求。
Whisper is a recent Automatic Speech Recognition (ASR) model displaying impressive robustness to both out-of-distribution inputs and random noise. In this work, we show that this robustness does not carry over to adversarial noise. We show that we can degrade Whisper performance dramatically, or even transcribe a target sentence of our choice, by generating very small input perturbations with Signal Noise Ratio of 35-45dB. We also show that by fooling the Whisper language detector we can very easily degrade the performance of multilingual models. These vulnerabilities of a widely popular open-source model have practical security implications and emphasize the need for adversarially robust ASR.