论文标题
使用深层混合网络在野外识别的音频视频情感识别
Audio-video Emotion Recognition in the Wild using Deep Hybrid Networks
论文作者
论文摘要
本文介绍了基于视听的情感识别混合网络。尽管以前的大多数工作都集中在使用图像中提取的深层模型或手工设计的功能上,但我们探索了构建图像和音频信号的多个深层模型。 Specifically, in addition to convolutional neural networks (CNN) and recurrent neutral networks (RNN) trained on facial images, the hybrid network also contains one SVM classifier trained on holistic acoustic feature vectors, one long short-term memory network (LSTM) trained on short-term feature sequences extracted from segmented audio clips, and one Inception(v2)-LSTM network trained on image-like地图是基于短期声学特征序列而构建的。实验结果表明,所提出的混合网络的表现优于基线方法。
This paper presents an audiovisual-based emotion recognition hybrid network. While most of the previous work focuses either on using deep models or hand-engineered features extracted from images, we explore multiple deep models built on both images and audio signals. Specifically, in addition to convolutional neural networks (CNN) and recurrent neutral networks (RNN) trained on facial images, the hybrid network also contains one SVM classifier trained on holistic acoustic feature vectors, one long short-term memory network (LSTM) trained on short-term feature sequences extracted from segmented audio clips, and one Inception(v2)-LSTM network trained on image-like maps, which are built based on short-term acoustic feature sequences. Experimental results show that the proposed hybrid network outperforms the baseline method by a large margin.