使用动态语言和语音嵌入来改善双语TT

论文标题

使用动态语言和语音嵌入来改善双语TT

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

论文作者

Yang, Fengyu, Luan, Jian, Wang, Yujun

论文摘要

在大多数情况下，双语tts需要处理三种类型的输入脚本：仅使用第一语言，仅第二语言，第二语言嵌入了第一语言。在后两种情况下，由于第一语言的影响，第二语言的发音和语调通常完全不同。因此，准确对第二语言中第二语言的发音和语调进行精确建模而不会相互干扰是一个巨大的挑战。本文构建了一个普通话 - 英语TTS系统，以从单语中的中言中获取更多标准的英语演讲。我们引入语音嵌入以捕获不同语音学之间的英语差异。嵌入掩码应用于语言嵌入，以区分不同语言之间的信息和嵌入语音嵌入以关注英语表达。我们专门设计一个嵌入力量调节器，以捕获语言和语音的动态强度。实验表明，我们的方法可以产生更自然和标准的英语语音演讲。从分析中，我们发现合适的语音控制在不同情况下有助于更好的性能。

In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation and intonation of the second language in different contexts without mutual interference. This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker. We introduce phonology embedding to capture the English differences between different phonology. Embedding mask is applied to language embedding for distinguishing information between different languages and to phonology embedding for focusing on English expression. We specially design an embedding strength modulator to capture the dynamic strength of language and phonology. Experiments show that our approach can produce significantly more natural and standard spoken English speech of the monolingual Chinese speaker. From analysis, we find that suitable phonology control contributes to better performance in different scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题