论文标题
GSV城市:寻求适当的监督视觉位置识别
GSV-Cities: Toward Appropriate Supervised Visual Place Recognition
论文作者
论文摘要
本文旨在研究大规模视觉位置识别的表示形式学习,该学习包括通过参考参考图像数据库来确定查询图像中描述的位置。这是一项艰巨的任务,这是由于随着时间的流逝可能发生的大规模环境变化(即天气,照明,季节,交通,遮挡)。目前,由于缺乏准确的地面真相的大型数据库,进度受到挑战。为了应对这一挑战,我们介绍了GSV城市,GSV城市是一个新的图像数据集,迄今为止以高度准确的地面真相提供了最广泛的地理覆盖范围,在14年期间覆盖了整个大陆的40多个城市。随后,我们探讨了深度度量学习的最新进展的全部潜力,以专门培训网络以进行位置识别,并评估不同的损失功能如何影响性能。此外,我们表明在GSV城市接受培训时,现有方法的性能会大大改善。最后,我们介绍了一个新的全卷积聚合层,该层优于现有技术,包括GEM,Netvlad和Cosplace,并在大规模的基准上建立了新的最先进的基准,例如匹兹堡,Mapillary-SLS,Sped,Sped和Nordland。该数据集和代码可在https://github.com/amaralibey/gsv-city上用于研究目的。
This paper aims to investigate representation learning for large scale visual place recognition, which consists of determining the location depicted in a query image by referring to a database of reference images. This is a challenging task due to the large-scale environmental changes that can occur over time (i.e., weather, illumination, season, traffic, occlusion). Progress is currently challenged by the lack of large databases with accurate ground truth. To address this challenge, we introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth, covering more than 40 cities across all continents over a 14-year period. We subsequently explore the full potential of recent advances in deep metric learning to train networks specifically for place recognition, and evaluate how different loss functions influence performance. In addition, we show that performance of existing methods substantially improves when trained on GSV-Cities. Finally, we introduce a new fully convolutional aggregation layer that outperforms existing techniques, including GeM, NetVLAD and CosPlace, and establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland. The dataset and code are available for research purposes at https://github.com/amaralibey/gsv-cities.