论文标题
与先前模型的整体视觉文本分析分析
Holistic Visual-Textual Sentiment Analysis with Prior Models
论文作者
论文摘要
视觉文字情感分析旨在通过一对图像和文本的输入来预测情绪,这在学习有效特征的各种输入图像方面构成了挑战。为了解决这个问题,我们提出了一种整体方法,该方法通过利用丰富的强大预训练的视觉和文本先验模型来实现可靠的视觉文本分析。 The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions.在三个数据集上进行的广泛实验表明,与现有方法相比,我们的方法产生的视觉文本分析性能更好。
Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text, which poses a challenge in learning effective features for diverse input images. To address this, we propose a holistic method that achieves robust visual-textual sentiment analysis by exploiting a rich set of powerful pre-trained visual and textual prior models. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions. Extensive experiments on three datasets show that our method produces better visual-textual sentiment analysis performance than existing methods.