探索影响视频中多模式情感识别的上下文因素

论文标题

探索影响视频中多模式情感识别的上下文因素

Exploring the contextual factors affecting multimodal emotion recognition in videos

论文作者

Bhattacharya, Prasanta, Gupta, Raj Kumar, Yang, Yinping

论文摘要

情感表达是当今数字平台上用户行为的关键部分。虽然多模式情感识别技术正在引起研究的关注，但对视觉和非视觉特征如何在某些情况下而不是其他方式识别情绪，而不是其他情况，但没有更深入的了解。这项研究分析了由面部表情，音调和文本得出的多模式情感特征与两个关键背景因素结合使用的效果：i）说话者的性别，ii）情感发作的持续时间。使用2,176个手动注释的YouTube视频的大型公共数据集，我们发现，虽然多模式功能始终超过双峰和单峰功能，但它们的性能在不同的情绪，性别和持续时间环境中差异很大。多模式功能在识别大多数情绪方面对男性扬声器的表现特别好。此外，多模式特征的性能比在识别中性和幸福方面的更长视频中表现出色，但不是悲伤和愤怒。这些发现为开发更多背景感情感识别和同理心系统提供了新的见解。

Emotional expressions form a key part of user behavior on today's digital platforms. While multimodal emotion recognition techniques are gaining research attention, there is a lack of deeper understanding on how visual and non-visual features can be used to better recognize emotions in certain contexts, but not others. This study analyzes the interplay between the effects of multimodal emotion features derived from facial expressions, tone and text in conjunction with two key contextual factors: i) gender of the speaker, and ii) duration of the emotional episode. Using a large public dataset of 2,176 manually annotated YouTube videos, we found that while multimodal features consistently outperformed bimodal and unimodal features, their performance varied significantly across different emotions, gender and duration contexts. Multimodal features performed particularly better for male speakers in recognizing most emotions. Furthermore, multimodal features performed particularly better for shorter than for longer videos in recognizing neutral and happiness, but not sadness and anger. These findings offer new insights towards the development of more context-aware emotion recognition and empathetic systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题