Intermulti：多模式相互作用与文本主导的层次高阶融合进行情绪分析

论文标题

Intermulti：多模式相互作用与文本主导的层次高阶融合进行情绪分析

InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis

论文作者

Qiu, Feng, Kong, Wanzeng, Ding, Yu

论文摘要

人类从多模式信号（例如语音内容，语音色调和面部表情）中阅读对话者的情绪方面很精致。但是，由于难以从多模式信号之间复杂的相互作用中有效解码情绪的困难，机器可能难以理解各种情绪。在本文中，我们提出了一个多模式情感分析框架Intermulti，以从不同观点捕获复杂的多模式相互作用，并从多模式信号中识别情绪。我们提出的框架将不同模态的信号分解为三种多模式相互作用表示，包括模态 - 满足相互作用表示，模态共享的相互作用表示和三种特定于模态的交互表示。此外，为了平衡不同模式的贡献并学习更有用的潜在互动表示，我们开发了一种新型文本主导的层次高阶融合（THHF）模块。 THHF模块合理地将上述三种表示形式整合到综合的多模式相互作用表示中。（即Mosei，Mosi和Iemocap）对广泛使用的数据集进行了广泛的实验结果，这表明我们的方法表现优于最先进的方法。

Humans are sophisticated at reading interlocutors' emotions from multimodal signals, such as speech contents, voice tones and facial expressions. However, machines might struggle to understand various emotions due to the difficulty of effectively decoding emotions from the complex interactions between multimodal signals. In this paper, we propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views and identify emotions from multimodal signals. Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations, including a modality-full interaction representation, a modality-shared interaction representation, and three modality-specific interaction representations. Additionally, to balance the contribution of different modalities and learn a more informative latent interaction representation, we developed a novel Text-dominated Hierarchical High-order Fusion(THHF) module. THHF module reasonably integrates the above three kinds of representations into a comprehensive multimodal interaction representation. Extensive experimental results on widely used datasets, (i.e.) MOSEI, MOSI and IEMOCAP, demonstrate that our method outperforms the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题