使用基于IPA的TACOTRON进行数据有效的跨语性扬声器适应和发音增强

论文标题

使用基于IPA的TACOTRON进行数据有效的跨语性扬声器适应和发音增强

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

论文作者

Hemati, Hamed, Borth, Damian

论文摘要

当有足够的数据可用时，最近的神经文本到语音（TTS）模型表现出色。但是，在低资源设置中，为新的扬声器或语言进行微调并不直接。在本文中，我们表明，通过对TACOTRON模型进行次要修改，可以仅使用20分钟的数据将新扬声器的现有TTS模型从相同或其他语言转移。为此，我们首先引入了带有语言不合义输入的基本多语言TACOTRON，然后演示如何在不利用任何预先培训的说话者编码器或代码转换技术的情况下，以适应说话者适应的不同情况。我们以主观和客观方式评估转移的模型。

Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.

下载PDF全文

下载文献需遵守相关版权规定

论文标题