Intortonomy：数据集和研究人类意图理解的研究

论文标题

Intortonomy：数据集和研究人类意图理解的研究

Intentonomy: a Dataset and Study towards Human Intent Understanding

论文作者

Jia, Menglin, Wu, Zuxuan, Reiter, Austin, Cardie, Claire, Belongie, Serge, Lim, Ser-Nam

论文摘要

图像价值一千个单词，传达的信息超出了其中的物理视觉内容。在本文中，我们研究了社交媒体图像背后的意图，以分析视觉信息如何帮助识别人类意图。为了实现这一目标，我们介绍了一个意图数据集，Intortonomy，其中包括14K图像，涵盖了各种各样的日常场景。这些图像用28个意图类别手动注释，这些类别来自社会心理学分类法。然后，我们系统地研究是否以及在何种程度上使用常见的视觉信息，即对象和上下文，有助于人类的动机理解。根据我们的发现，我们进行了进一步的研究，以量化在训练意图分类器时以主题标签的形式量化对象和上下文类别的效果。我们的结果定量和定性地阐明了视觉和文本信息在预测意图时如何产生可观察到的效果。

An image is worth a thousand words, conveying information that goes beyond the physical visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can help the recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories that are derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题