端到端的三胞胎基于基于言语情感识别的情感嵌入系统

论文标题

端到端的三胞胎基于基于言语情感识别的情感嵌入系统

End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition

论文作者

Kumar, Puneet, Jain, Sidharth, Raman, Balasubramanian, Roy, Partha Pratim, Iwamura, Masakazu

论文摘要

在本文中，已经提出了基于三胞胎损失和残留学习的端到端神经嵌入系统，以供语音情绪识别。拟议的系统从语音话语的情感信息中学习了嵌入。学到的嵌入用于识别由不同长度的给定语音样本所描绘的情绪。提出的系统实现了残留的神经网络体系结构。它是使用SoftMax预训练和三重损失函数训练的。训练有素网络的完全连接和嵌入层之间的权重计算嵌入值。各种情绪的嵌入表示形式被映射到超平面上，并且使用余弦相似性计算它们中的角度。这些角度用于将新的语音样本分类为其适当的情感类别。拟议的系统已证明了91.67％和64.44％的精度，同时识别Ravdess和Iemocap数据集的情绪。

In this paper, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition. The proposed system learns the embeddings from the emotional information of the speech utterances. The learned embeddings are used to recognize the emotions portrayed by given speech samples of various lengths. The proposed system implements Residual Neural Network architecture. It is trained using softmax pre-training and triplet loss function. The weights between the fully connected and embedding layers of the trained network are used to calculate the embedding values. The embedding representations of various emotions are mapped onto a hyperplane, and the angles among them are computed using the cosine similarity. These angles are utilized to classify a new speech sample into its appropriate emotion class. The proposed system has demonstrated 91.67% and 64.44% accuracy while recognizing emotions for RAVDESS and IEMOCAP dataset, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题