论文标题
中世纪遥远的观看是否可能? :使用视觉分析扩展和丰富遗留图像收集的注释
Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics
论文作者
论文摘要
遥远的观看方法通常使用靠近用于训练机器学习模型的当代图像数据的图像数据集。要使用其他历史时期的图像需要专家注释的数据,标签质量对于结果质量至关重要。尤其是在使用包含无数不确定性,注释数据或重新通知的文化遗产收藏时,旧数据是一项艰巨的任务。在本文中,我们描述了与两组预注册的中世纪手稿图像一起工作,这些图像表现出冲突和重叠的元数据。由于对两个遗产本体论的手动对帐非常昂贵,因此(1)的目标是创建一组更统一的描述性标签,以作为组合数据集中的“桥梁”,以及(2)来建立一个可以用作有价值的输入的高质量层次分类,以进行后续监督机器学习。为了实现这些目标,我们开发了可视化和互动机制,使中世纪主义者能够合并,正规化和扩展用于描述这些和其他同源图像数据集的词汇。视觉接口为专家提供了数据中关系的概述,超出了元数据的总和。单词和图像的嵌入以及数据集跨数据集的标签的共发生,启用图像的批次重新通道,标签候选者的建议以及支持标签的层次分类的支持。
Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a "bridge" in the combined dataset, and (2) to establish a high quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets, enable batch re-annotation of images, recommendation of label candidates and support composing a hierarchical classification of labels.