使用自动编码器和对抗培训的扬声器去识别系统

论文标题

使用自动编码器和对抗培训的扬声器去识别系统

Speaker De-identification System using Autoencoders and Adversarial Training

论文作者

Espinoza-Cuadros, Fernando M., Perero-Codosero, Juan M., Antón-Martín, Javier, Hernández-Gómez, Luis A.

论文摘要

从用户那里收集个人数据的Web服务和移动应用程序的快速增加会增加其隐私可能严重损害的风险。尤其是，在深度学习中，越来越多的口语界面和语音助手赋予了能力，这引起了欧盟的重要问题，以保护语音数据隐私。例如，攻击者可以记录用户的语音并冒充他们的语音，以访问需要语音标识的系统。通过现有技术，可以从用户中提取扬声器，语言（例如，方言）和副语言特征（例如，年龄）来提取用户的黑客扬声器配置文件。为了减轻这些弱点，在本文中，我们提出了一个基于对抗性培训和自动编码器的说话者去识别系统，以抑制演讲者的说话者，性别和口音信息。实验结果表明，将对抗性学习和自动编码器结合起来会增加说话者验证系统的同等错误率，同时保留匿名口语内容的清晰度。

The fast increase of web services and mobile apps, which collect personal data from users, increases the risk that their privacy may be severely compromised. In particular, the increasing variety of spoken language interfaces and voice assistants empowered by the vertiginous breakthroughs in Deep Learning are prompting important concerns in the European Union to preserve speech data privacy. For instance, an attacker can record speech from users and impersonate them to get access to systems requiring voice identification. Hacking speaker profiles from users is also possible by means of existing technology to extract speaker, linguistic (e.g., dialect) and paralinguistic features (e.g., age) from the speech signal. In order to mitigate these weaknesses, in this paper, we propose a speaker de-identification system based on adversarial training and autoencoders in order to suppress speaker, gender, and accent information from speech. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system while preserving the intelligibility of the anonymized spoken content.

下载PDF全文

下载文献需遵守相关版权规定

论文标题