在波斯语音中使用深度神经网络中的情感识别

论文标题

在波斯语音中使用深度神经网络中的情感识别

Emotion Recognition In Persian Speech Using Deep Neural Networks

论文作者

Yazdani, Ali, Simchi, Hossein, Shekofteh, Yasser

论文摘要

语音情绪识别（SER）在人类计算机互动（HCI）中至关重要，因为它可以更深入地了解情况并带来更好的相互作用。近年来，已经开发了各种机器学习和深度学习（DL）算法来改善SER技术。对口语情绪的识别取决于不同语言之间的表达类型。在本文中，为了进一步研究Farsi语言中的重要因素，我们检查了2018年发布的Farsi/Persian数据集上的各种DL技术，Sharif情感语音数据库（Shemo）。使用低水平和高级描述中的信号特征，以及不同的深度神经网络和机器学习技术，均为95.20％（ua）的精确度（UA），并获得了65.20％的精确度（UA），并适合65.20％的效率（UA）。成就了。

Speech Emotion Recognition (SER) is of great importance in Human-Computer Interaction (HCI), as it provides a deeper understanding of the situation and results in better interaction. In recent years, various machine learning and Deep Learning (DL) algorithms have been developed to improve SER techniques. Recognition of the spoken emotions depends on the type of expression that varies between different languages. In this paper, to further study important factors in the Farsi language, we examine various DL techniques on a Farsi/Persian dataset, Sharif Emotional Speech Database (ShEMO), which was released in 2018. Using signal features in low- and high-level descriptions and different deep neural networks and machine learning techniques, Unweighted Accuracy (UA) of 65.20% and Weighted Accuracy (WA) of 78.29% are achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题