论文标题
通过深二进制哈希和量化有效的跨模式检索
Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization
论文作者
论文摘要
跨模式检索旨在搜索具有不同内容方式的语义含义相似的数据。但是,跨模式检索需要大量的存储时间和检索时间,因为它需要以多种方式处理数据。现有的作品着重于学习单源紧凑型功能,例如二进制哈希码,这些特征可以保持不同方式之间的相似性。在这项工作中,我们提出了一个共同学习的深层散布和量化网络(HQ),以进行跨模式检索。我们同时学习二进制哈希代码和量化代码,以通过端到端的深度学习体系结构以多种方式保留语义信息。在检索步骤中,使用二进制哈希来从搜索空间中检索一部分项目,然后使用量化来重新排列检索的项目。从理论上讲,我们从理论上和经验表明,这种两阶段的检索方法在保持准确性的同时提供了更快的检索结果。与基于监督的神经网络的紧凑型编码模型相比,在整个NUS,Mir-Flickr和Amazon数据集的实验结果表明,总部的精度增长了7%以上。
Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. However, cross-modal retrieval requires huge amounts of storage and retrieval time since it needs to process data in multiple modalities. Existing works focused on learning single-source compact features such as binary hash codes that preserve similarities between different modalities. In this work, we propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. We simultaneously learn binary hash codes and quantization codes to preserve semantic information in multiple modalities by an end-to-end deep learning architecture. At the retrieval step, binary hashing is used to retrieve a subset of items from the search space, then quantization is used to re-rank the retrieved items. We theoretically and empirically show that this two-stage retrieval approach provides faster retrieval results while preserving accuracy. Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision compared to supervised neural network-based compact coding models.