论文标题

使用$β$ -VAE的单发跨语性语音转换的删除语音表示学习

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $β$-VAE

论文作者

Lu, Hui, Wang, Disong, Wu, Xixin, Wu, Zhiyong, Liu, Xunying, Meng, Helen

论文摘要

我们提出了一种无监督的学习方法,以将语音分解为内容表示形式和说话者身份表示。我们将此方法应用于具有挑战性的单发跨语性语音转换任务,以证明解开的有效性。受$β$ -VAE的启发,我们引入了一个学习目标,该目标在内容和说话者表示捕获的信息之间平衡。此外,建筑设计和培训数据集的感应偏见进一步鼓励了所需的分离。客观和主观评估都表明了所提出的方法在语音解开和单发语言语音转换中的有效性。

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation. We apply this method to the challenging one-shot cross-lingual voice conversion task to demonstrate the effectiveness of the disentanglement. Inspired by $β$-VAE, we introduce a learning objective that balances between the information captured by the content and speaker representations. In addition, the inductive biases from the architectural design and the training dataset further encourage the desired disentanglement. Both objective and subjective evaluations show the effectiveness of the proposed method in speech disentanglement and in one-shot cross-lingual voice conversion.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源