论文标题

情感强度及其控制情绪转化的控制

Emotion Intensity and its Control for Emotional Voice Conversion

论文作者

Zhou, Kun, Sisman, Berrak, Rana, Rajib, Schuller, Björn W., Li, Haizhou

论文摘要

情感语音转换(EVC)试图在保留语言内容和说话者身份的同时转换话语的情感状态。在EVC中,情绪通常被视为离散类别,忽略了以下事实:语音还以各种强度水平传达了听众可以感知的情绪。在本文中,我们旨在明确表征和控制情感的强度。我们建议将扬声器风格从语言内容中解散,并将扬声器风格编码为嵌入在构成情感嵌入原型的连续空间中的样式。我们从情感标记的数据库中进一步学习了实际的情感编码器,并研究了相对属性以代表细粒度的情绪强度的使用。为了确保情绪可理解性,我们将情绪分类的损失和情感纳入EVC网络的训练中。根据需要,提出的网络控制输出语音中的细粒情绪强度。通过客观和主观评估,我们验证了提出的网络对情绪表现力和情感强度控制的有效性。

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emotions with various intensity levels that the listener can perceive. In this paper, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding. We further learn the actual emotion encoder from an emotion-labelled database and study the use of relative attributes to represent fine-grained emotion intensity. To ensure emotional intelligibility, we incorporate emotion classification loss and emotion embedding similarity loss into the training of the EVC network. As desired, the proposed network controls the fine-grained emotion intensity in the output speech. Through both objective and subjective evaluations, we validate the effectiveness of the proposed network for emotional expressiveness and emotion intensity control.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源