论文标题
在编码器框架中的语音转换转换
Speech-to-Singing Conversion in an Encoder-Decoder Framework
论文作者
论文摘要
在本文中,我们的目标是将一组口语转换为演唱的线条。与以前的基于信号处理的方法不同,我们采用基于学习的方法来解决问题。这使我们能够自动建模此转换的各个方面,从而克服对特定输入(例如高质量唱歌模板或音素得分同步信息)的依赖性。具体来说,我们为我们的任务提出了一个编码框架。给定语音和目标旋律轮廓的时频表示,我们学会了编码,使我们能够合成唱歌,从而在坚持目标旋律的同时保留说话者的语言内容和音色。我们还提出了一个基于多任务学习的目标,以提高抒情的可理解性。我们对框架进行了定量和定性分析。
In this paper our goal is to convert a set of spoken lines into sung ones. Unlike previous signal processing based methods, we take a learning based approach to the problem. This allows us to automatically model various aspects of this transformation, thus overcoming dependence on specific inputs such as high quality singing templates or phoneme-score synchronization information. Specifically, we propose an encoder--decoder framework for our task. Given time-frequency representations of speech and a target melody contour, we learn encodings that enable us to synthesize singing that preserves the linguistic content and timbre of the speaker while adhering to the target melody. We also propose a multi-task learning based objective to improve lyric intelligibility. We present a quantitative and qualitative analysis of our framework.