使用离散的自我监督单元中波形域中的口语风格转换

论文标题

使用离散的自我监督单元中波形域中的口语风格转换

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units

论文作者

Maimon, Gallil, Adi, Yossi

论文摘要

我们介绍了Dissc，这是一种新颖，轻巧的方法，该方法将录音的节奏，音调轮廓和音色转换为目标扬声器，以无文本的方式转换为目标扬声器。与DISSC不同，大多数语音转换（VC）方法主要集中在音色上，而忽略了人们独特的说话风格（韵律）。所提出的方法使用经过预定的，自我监督的模型将语音编码为离散单元，这使其简单，有效且快速训练。所有转换模块仅在像任务这样的重建训练中训练，因此适用于没有配对数据的任何对数量的VC。我们为此设置引入了一套定量和定性评估指标，并从经验上证明，DISSC显着超过了评估的基准。代码和样本可在https://pages.cs.huji.ac.il/adiyoss-lab/dissc/上找到。

We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice conversion (VC) methods focus primarily on timbre, and ignore people's unique speaking style (prosody). The proposed approach uses a pretrained, self-supervised model for encoding speech to discrete units, which makes it simple, effective, and fast to train. All conversion modules are only trained on reconstruction like tasks, thus suitable for any-to-many VC with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate that DISSC significantly outperforms the evaluated baselines. Code and samples are available at https://pages.cs.huji.ac.il/adiyoss-lab/dissc/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题