多型网格：通过低分辨率图像增强位置识别培训

论文标题

多型网格：通过低分辨率图像增强位置识别培训

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

论文作者

Khaliq, Ahmad, Milford, Michael, Garg, Sourav

论文摘要

Visual Plote识别（VPR）是6-DOF定位，视觉大满贯和结构 - 移动管道的重要组成部分，其任务是通过匹配全局位置描述符来生成位置匹配假设的初始列表。但是，通常基于CNN的方法要么在训练后处理多个图像分辨率，要么使用单个分辨率进行多个图像分辨率，并在训练过程中将多尺度特征提取到最后一个卷积层。在本文中，我们使用低分辨率图像金字塔编码来增强NetVlad表示学习，从而导致更丰富的位置表示。所得的多分辨率特征金字塔可以通过VLAD方便地聚集成单个紧凑的表示，从而避免了在最近的多尺度方法中需要串联或求和多个贴片的需要。此外，我们表明，基本学习的功能张量可以与现有的多尺度方法相结合，以提高其基线性能。 15个观点变化和观点一致的基准测定数据集的评估证实，与现有的11条相比，所提出的多次NETVLAD导致了基于全球描述符的检索的最新召回@n性能。源代码可在https://github.com/ahmedest61/multires-netvlad上公开获得。

Visual Place Recognition (VPR) is a crucial component of 6-DoF localization, visual SLAM and structure-from-motion pipelines, tasked to generate an initial list of place match hypotheses by matching global place descriptors. However, commonly-used CNN-based methods either process multiple image resolutions after training or use a single resolution and limit multi-scale feature extraction to the last convolutional layer during training. In this paper, we augment NetVLAD representation learning with low-resolution image pyramid encoding which leads to richer place representations. The resultant multi-resolution feature pyramid can be conveniently aggregated through VLAD into a single compact representation, avoiding the need for concatenation or summation of multiple patches in recent multi-scale approaches. Furthermore, we show that the underlying learnt feature tensor can be combined with existing multi-scale approaches to improve their baseline performance. Evaluation on 15 viewpoint-varying and viewpoint-consistent benchmarking datasets confirm that the proposed MultiRes-NetVLAD leads to state-of-the-art Recall@N performance for global descriptor based retrieval, compared against 11 existing techniques. Source code is publicly available at https://github.com/Ahmedest61/MultiRes-NetVLAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题