论文标题
对基于tacotron的系统的石墨嵌入与发音之间的关系的研究
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems
论文作者
论文摘要
端到端模型,尤其是基于TACOTRON的模型,目前是文本到语音综合的流行解决方案。它们允许生产高质量的合成语音,几乎没有文本预处理。实际上,可以将它们直接作为输入来训练它们。但是,在素计输入的情况下,关于模型学到的基本表示与单词发音之间的关系知之甚少。这项工作调查了这种与法国素描训练的TACOTRON模型的关系。我们的分析表明,尽管在训练过程中未存在此类信息,但谱系嵌入与音素信息有关。借助此属性,我们表明,通过Tacotron模型学到的石墨嵌入对于诸如合成语音中的发音的素式转换和对发音的控制之类的任务可能很有用。
End-to-end models, particularly Tacotron-based ones, are currently a popular solution for text-to-speech synthesis. They allow the production of high-quality synthesized speech with little to no text preprocessing. Indeed, they can be trained using either graphemes or phonemes as input directly. However, in the case of grapheme inputs, little is known concerning the relation between the underlying representations learned by the model and word pronunciations. This work investigates this relation in the case of a Tacotron model trained on French graphemes. Our analysis shows that grapheme embeddings are related to phoneme information despite no such information being present during training. Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.