论文标题

多型网格:通过低分辨率图像增强位置识别培训

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

论文作者

Khaliq, Ahmad, Milford, Michael, Garg, Sourav

论文摘要

Visual Plote识别(VPR)是6-DOF定位,视觉大满贯和结构 - 移动管道的重要组成部分,其任务是通过匹配全局位置描述符来生成位置匹配假设的初始列表。但是,通常基于CNN的方法要么在训练后处理多个图像分辨率,要么使用单个分辨率进行多个图像分辨率,并在训练过程中将多尺度特征提取到最后一个卷积层。在本文中,我们使用低分辨率图像金字塔编码来增强NetVlad表示学习,从而导致更丰富的位置表示。所得的多分辨率特征金字塔可以通过VLAD方便地聚集成单个紧凑的表示,从而避免了在最近的多尺度方法中需要串联或求和多个贴片的需要。此外,我们表明,基本学习的功能张量可以与现有的多尺度方法相结合,以提高其基线性能。 15个观点变化和观点一致的基准测定数据集的评估证实,与现有的11条相比,所提出的多次NETVLAD导致了基于全球描述符的检索的最新召回@n性能。源代码可在https://github.com/ahmedest61/multires-netvlad上公开获得。

Visual Place Recognition (VPR) is a crucial component of 6-DoF localization, visual SLAM and structure-from-motion pipelines, tasked to generate an initial list of place match hypotheses by matching global place descriptors. However, commonly-used CNN-based methods either process multiple image resolutions after training or use a single resolution and limit multi-scale feature extraction to the last convolutional layer during training. In this paper, we augment NetVLAD representation learning with low-resolution image pyramid encoding which leads to richer place representations. The resultant multi-resolution feature pyramid can be conveniently aggregated through VLAD into a single compact representation, avoiding the need for concatenation or summation of multiple patches in recent multi-scale approaches. Furthermore, we show that the underlying learnt feature tensor can be combined with existing multi-scale approaches to improve their baseline performance. Evaluation on 15 viewpoint-varying and viewpoint-consistent benchmarking datasets confirm that the proposed MultiRes-NetVLAD leads to state-of-the-art Recall@N performance for global descriptor based retrieval, compared against 11 existing techniques. Source code is publicly available at https://github.com/Ahmedest61/MultiRes-NetVLAD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源