用于层次嵌入的自适应语义视野树

论文标题

用于层次嵌入的自适应语义视野树

Adaptive Semantic-Visual Tree for Hierarchical Embeddings

论文作者

Yang, Shuo, Yu, Wei, Zheng, Ying, Yao, Hongxun, Mei, Tao

论文摘要

商品类别固有地形成了具有不同概念抽象的语义层次结构，尤其是对于细粒类别。该层次结构编码不同级别的各个类别之间的丰富相关性，这些相关性可以有效地使语义空间正常，从而使预测不那么模棱两可。但是，先前对细粒图像检索的研究主要集中于语义相似性或视觉相似性。在真实的应用程序中，仅使用视觉相似性可能无法满足消费者的需求，即使用现实生活中的图像搜索商品，例如，将红色外套作为查询图像，我们可能只能基于视觉相似性而获得红色西装，因为它们在视觉上相似。但是用户实际上想要一件外套而不是西装，即使外套具有不同的颜色或纹理属性。我们在实际实践中基于Photoshopping介绍了这个新问题。这就是为什么集成语义信息以在“ Visual”之前制作“语义”的正规信息的原因。为了解决这个新问题，我们提出了一个层次自适应的语义 - 视觉树（ASVT）来描述商品类别的架构，该类别评估了同一语义类别中不同语义级别和同一语义类别中不同语义相似性之间的语义相似性。语义信息满足消费者对类似商品的需求，而视觉信息优化了语义类别中的相关性。在每个级别，我们根据语义层次结构设置不同的边距，并将其作为先验信息，以学习细粒的功能嵌入。为了评估我们的框架，我们提出了一个名为JDProduct的新数据集，并在在线购物应用程序上从实际图像查询和官方商品图像中收集了层次标签。公共车196和幼崽的广泛实验结果

Merchandise categories inherently form a semantic hierarchy with different levels of concept abstraction, especially for fine-grained categories. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make predictions less ambiguous. However, previous studies of fine-grained image retrieval primarily focus on semantic similarities or visual similarities. In a real application, merely using visual similarity may not satisfy the need of consumers to search merchandise with real-life images, e.g., given a red coat as a query image, we might get a red suit in recall results only based on visual similarity since they are visually similar. But the users actually want a coat rather than suit even the coat is with different color or texture attributes. We introduce this new problem based on photoshopping in real practice. That's why semantic information are integrated to regularize the margins to make "semantic" prior to "visual". To solve this new problem, we propose a hierarchical adaptive semantic-visual tree (ASVT) to depict the architecture of merchandise categories, which evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously. The semantic information satisfies the demand of consumers for similar merchandise with the query while the visual information optimizes the correlations within the semantic class. At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding. To evaluate our framework, we propose a new dataset named JDProduct, with hierarchical labels collected from actual image queries and official merchandise images on an online shopping application. Extensive experimental results on the public CARS196 and CUB-

下载PDF全文

下载文献需遵守相关版权规定

论文标题