论文标题
通过人类相似性判断和心理嵌入丰富成像网
Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings
论文作者
论文摘要
物体识别的进步部分是由于高质量数据集和相关基准的可用性。但是,这些基准分析(例如ILSVRC)是相对特定于任务的,主要集中在预测类标签上。我们介绍了一个公共可用的数据集,该数据集体现了人类感知和推理的任务能力。人类的相似性判断扩展到ImageNet(Imagenet-HSJ)是由补充ILSVRC验证集的人类相似性判断组成。新数据集支持一系列任务和性能指标,包括评估无监督的学习算法。我们演示了两种评估方法:直接使用相似性判断,并使用对相似性判断的心理嵌入式的嵌入。与基于人类判断的以前的努力相比,这个嵌入空间的点(即图像)的数量级多(即图像)。通过选择性抽样过程,将变量的贝叶斯推理和模型集合用于嵌入空间的示例方面,这使得缩放到完整的50,000张图像集可以实现。这项方法上的创新不仅可以缩放,而且还应通过将采样集中在需要的位置来提高解决方案的质量。为了证明Imagenet-HSJ的实用性,我们使用了相似性等级和嵌入空间来评估几种流行模型符合人类相似性判断的很好。一个发现是,在特定于任务的基准上表现更好的更复杂的模型并不能更好地符合人类的语义判断。除了人类的相似性判断之外,还可以公开获得预训练的心理嵌入和推断变异嵌入的守则。 Imagenet-HSJ资产总的来说,支持内部表示的评估和更类似人类的模型的发展。
Advances in object recognition flourished in part because of the availability of high-quality datasets and associated benchmarks. However, these benchmarks---such as ILSVRC---are relatively task-specific, focusing predominately on predicting class labels. We introduce a publicly-available dataset that embodies the task-general capabilities of human perception and reasoning. The Human Similarity Judgments extension to ImageNet (ImageNet-HSJ) is composed of human similarity judgments that supplement the ILSVRC validation set. The new dataset supports a range of task and performance metrics, including the evaluation of unsupervised learning algorithms. We demonstrate two methods of assessment: using the similarity judgments directly and using a psychological embedding trained on the similarity judgments. This embedding space contains an order of magnitude more points (i.e., images) than previous efforts based on human judgments. Scaling to the full 50,000 image set was made possible through a selective sampling process that used variational Bayesian inference and model ensembles to sample aspects of the embedding space that were most uncertain. This methodological innovation not only enables scaling, but should also improve the quality of solutions by focusing sampling where it is needed. To demonstrate the utility of ImageNet-HSJ, we used the similarity ratings and the embedding space to evaluate how well several popular models conform to human similarity judgments. One finding is that more complex models that perform better on task-specific benchmarks do not better conform to human semantic judgments. In addition to the human similarity judgments, pre-trained psychological embeddings and code for inferring variational embeddings are made publicly available. Collectively, ImageNet-HSJ assets support the appraisal of internal representations and the development of more human-like models.