探索自我监督的语音模型：关于情感语料库的研究

论文标题

探索自我监督的语音模型：关于情感语料库的研究

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

论文作者

Li, Yuanchao, Mohamied, Yumnah, Bell, Peter, Lai, Catherine

论文摘要

在过去的几年中，自我监督的语音模型已经快速发展，并且证明可在下游任务中使用。最近的一些工作已经开始研究这些模型的特征，但是许多担忧尚未得到充分解决。在这项工作中，我们进行了一项关于情感语料库的研究，以探索一种流行的自我监管模型-WAV2VEC 2.0。通过一组定量分析，我们主要证明：1）WAV2VEC 2.0似乎丢弃了对单词识别目的有用的副语言信息； 2）为了识别情绪，单独的中层表示和从图层平均得出的表示形式，而最终层在某些情况下会导致最差的性能； 3）当前的自我监督模型可能不是使用非时代特征的下游任务的最佳解决方案。我们的工作提供了新颖的发现，这些发现将有助于在这一领域的未来研究以及使用现有模型的理论基础。

Self-supervised speech models have grown fast during the past few years and have proven feasible for use in various downstream tasks. Some recent work has started to look at the characteristics of these models, yet many concerns have not been fully addressed. In this work, we conduct a study on emotional corpora to explore a popular self-supervised model -- wav2vec 2.0. Via a set of quantitative analysis, we mainly demonstrate that: 1) wav2vec 2.0 appears to discard paralinguistic information that is less useful for word recognition purposes; 2) for emotion recognition, representations from the middle layer alone perform as well as those derived from layer averaging, while the final layer results in the worst performance in some cases; 3) current self-supervised models may not be the optimal solution for downstream tasks that make use of non-lexical features. Our work provides novel findings that will aid future research in this area and theoretical basis for the use of existing models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题