论文标题

使用自然语言处理来预测服装核心词汇量的历史文物

Using Natural Language Processing to Predict Costume Core Vocabulary of Historical Artifacts

论文作者

Muralikrishnan, Madhuvanti, Hilal, Amr, Miller, Chreston, Smith-Glaviana, Dina

论文摘要

历史悠久的礼服文物是人类研究的宝贵来源。特别是,他们可以为相应时代的社会方面提供重要的见解。这些见解通常来自服装图片以及随附的描述,通常存储在标准化和受控的词汇中,该词汇准确地描述了服装和服装项目,称为服装核心词汇。从服装描述中建立准确的服装核心可能会具有挑战性,因为历史悠久的服装项目经常捐赠,并且随附的描述可以基于未经训练的个人,并使用该项目时期常见的语言。在本文中,我们提出了一种使用自然语言处理(NLP)的方法,以将历史性物品的自由形式文本描述映射到服装核心提供的受控词汇量。尽管数据集有限,但我们还是能够基于通用句子编码器来训练NLP模型,以执行此映射,以超过90%的测试准确性,用于一部分Costume Core Core词汇。我们描述了我们方法的方法,设计选择和开发方法,并展示了预测看不见描述的服装核心的可行性。随着更多的服装描述仍被策划用于培训,我们希望具有更高的准确性,以更好地概括。

Historic dress artifacts are a valuable source for human studies. In particular, they can provide important insights into the social aspects of their corresponding era. These insights are commonly drawn from garment pictures as well as the accompanying descriptions and are usually stored in a standardized and controlled vocabulary that accurately describes garments and costume items, called the Costume Core Vocabulary. Building an accurate Costume Core from garment descriptions can be challenging because the historic garment items are often donated, and the accompanying descriptions can be based on untrained individuals and use a language common to the period of the items. In this paper, we present an approach to use Natural Language Processing (NLP) to map the free-form text descriptions of the historic items to that of the controlled vocabulary provided by the Costume Core. Despite the limited dataset, we were able to train an NLP model based on the Universal Sentence Encoder to perform this mapping with more than 90% test accuracy for a subset of the Costume Core vocabulary. We describe our methodology, design choices, and development of our approach, and show the feasibility of predicting the Costume Core for unseen descriptions. With more garment descriptions still being curated to be used for training, we expect to have higher accuracy for better generalizability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源