基于差异流的差异差异框架，用于多说话的情绪转换

论文标题

基于差异流的差异差异框架，用于多说话的情绪转换

A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

论文作者

Shankar, Ravi, Hsieh, Hsi-Wei, Charon, Nicolas, Venkataraman, Archana

论文摘要

本文引入了一个新的框架，用于语音中的非平行情绪转换。我们的框架基于两个关键贡献。首先，我们提出了流行自行车模型的随机版本。我们修改的损失函数引入了kullback leibler（KL）差异项，该项将发电机学到的源和目标数据分布对齐，从而克服了样本明智的生成的局限性。通过使用与此随机损耗函数的变异近似，我们表明我们的KL差异项可以通过配对密度鉴别器实现。我们将这种新体系结构称为差异周期（VCGAN）。其次，我们将目标情感的韵律特征建模为对源韵律特征的平稳且可学习的变形。这种方法提供了隐性的正则化，从而提供了关键的优势，以更好的范围对齐方式看不见和分发扬声器。我们进行了严格的实验和比较研究，以证明我们所提出的框架在与几个最先进的基线相对的高性能方面相当强大。

This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function introduces a Kullback Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sample wise generation. By using a variational approximation to this stochastic loss function, we show that our KL divergence term can be implemented via a paired density discriminator. We term this new architecture a variational CycleGAN (VCGAN). Second, we model the prosodic features of target emotion as a smooth and learnable deformation of the source prosodic features. This approach provides implicit regularization that offers key advantages in terms of better range alignment to unseen and out of distribution speakers. We conduct rigorous experiments and comparative studies to demonstrate that our proposed framework is fairly robust with high performance against several state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题