论文标题

创意实践中的多模式词感觉歧义

Multimodal Word Sense Disambiguation in Creative Practice

论文作者

de Guevara, Manuel Ladron, George, Christopher, Gupta, Akshat, Byrne, Daragh, Krishnamurti, Ramesh

论文摘要

语言是模棱两可的;许多术语和表达都可以传达相同的想法。在创意实践中尤其如此,在这种实践中,思想和设计意图是高度主观的。我们介绍了当代工件的艺术图像(Adari)的数据集,模棱两可的描述,旨在为主观图像描述和在创意实践的背景下提供基础资源,以提供主观图像描述和多模式词的歧义。该数据集包含240k图像,标有260k描述性句子。它还组织成建筑,艺术,设计,时尚,家具,产品设计和技术的子域。在主观图像描述中,标签不是确定性的:例如,模棱两可的标签动态可能对应于数百个不同的图像。为了理解这种复杂性,我们使用最先进的预训练的BERT模型进行句子分类分析了文本与图像的歧义和相关性。我们为多标签分类任务提供了基线,并展示了多模式方法的潜力,以理解设计意图中的歧义。我们希望Adari数据集和基线构成迈向主观标签分类的第一步。

Language is ambiguous; many terms and expressions can convey the same idea. This is especially true in creative practice, where ideas and design intents are highly subjective. We present a dataset, Ambiguous Descriptions of Art Images (ADARI), of contemporary workpieces, which aims to provide a foundational resource for subjective image description and multimodal word disambiguation in the context of creative practice. The dataset contains a total of 240k images labeled with 260k descriptive sentences. It is additionally organized into sub-domains of architecture, art, design, fashion, furniture, product design and technology. In subjective image description, labels are not deterministic: for example, the ambiguous label dynamic might correspond to hundreds of different images. To understand this complexity, we analyze the ambiguity and relevance of text with respect to images using the state-of-the-art pre-trained BERT model for sentence classification. We provide a baseline for multi-label classification tasks and demonstrate the potential of multimodal approaches for understanding ambiguity in design intentions. We hope that ADARI dataset and baselines constitute a first step towards subjective label classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源