视觉关系预测的张量组成网

论文标题

视觉关系预测的张量组成网

Tensor Composition Net for Visual Relationship Prediction

论文作者

Qiang, Yuting, Yang, Yongxin, Zhang, Xueting, Guo, Yanwen, Hospedales, Timothy M.

论文摘要

我们提出了一种新颖的张量组成网（TCN），以预测图像中的视觉关系。视觉关系预测（VRP）比传统的图像标记提供了对图像理解的更具挑战性的测试，并且由于标签空间和不完整的注释，很难学习。 TCN的关键思想是利用视觉关系张量的低级别属性，以利用对象和关系之间和跨对象之间的相关性，并对图像中所有视觉关系做出结构化的预测。为了显示我们的模型的有效性，我们首先将模型与多标签图像分类（MIC）方法，极端多标签分类（XMC）方法和VRD方法进行经验比较。然后，我们证明，由于张张量（DE）组成层，我们的模型可以预测训练数据集中尚未看到的视觉关系。最终，我们显示了TCN的图像级视觉关系预测，即使与VRD方法相比，即使是基于关系的图像进行预理论，也为基于关系的图像回归提供了一种简单有效的机制。

We present a novel Tensor Composition Net (TCN) to predict visual relationships in images. Visual Relationship Prediction (VRP) provides a more challenging test of image understanding than conventional image tagging and is difficult to learn due to a large label-space and incomplete annotation. The key idea of our TCN is to exploit the low-rank property of the visual relationship tensor, so as to leverage correlations within and across objects and relations and make a structured prediction of all visual relationships in an image. To show the effectiveness of our model, we first empirically compare our model with Multi-Label Image Classification (MLIC) methods, eXtreme Multi-label Classification (XMC) methods, and VRD methods. We then show that thanks to our tensor (de)composition layer, our model can predict visual relationships which have not been seen in the training dataset. We finally show our TCN's image-level visual relationship prediction provides a simple and efficient mechanism for relation-based image-retrieval even compared with VRD methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题