用跨模式变压器进行情绪识别的自我监督学习

论文标题

用跨模式变压器进行情绪识别的自我监督学习

Self-Supervised learning with cross-modal transformers for emotion recognition

论文作者

Khare, Aparna, Parthasarathy, Srinivas, Sundaram, Shiva

论文摘要

由于野外标记的数据集的可用性有限，情绪识别是一项具有挑战性的任务。自我监督的学习表明，在语音和自然语言等领域中标记有限的数据集对任务的改进。诸如Bert之类的模型学会将上下文纳入单词嵌入，这转化为在问答诸如问题回答之类的下游任务中的性能。在这项工作中，我们将自我监督的培训扩展到多模式应用程序。我们使用在蒙版语言建模任务上训练有音频，视觉和文本功能的变压器学习多模式表示。该模型在情感识别的下游任务上进行了微调。我们在CMU-MOSEI数据集上的结果表明，与基线相比，这种预训练技术可以提高情绪识别性能高达3％。

Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language. Models such as BERT learn to incorporate context in word embeddings, which translates to improved performance in downstream tasks like question answering. In this work, we extend self-supervised training to multi-modal applications. We learn multi-modal representations using a transformer trained on the masked language modeling task with audio, visual and text features. This model is fine-tuned on the downstream task of emotion recognition. Our results on the CMU-MOSEI dataset show that this pre-training technique can improve the emotion recognition performance by up to 3% compared to the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题