使用金字塔占用网络预测图像中的语义图表示

论文标题

使用金字塔占用网络预测图像中的语义图表示

Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks

论文作者

Roddick, Thomas, Cipolla, Roberto

论文摘要

自动驾驶汽车通常依赖于其环境的高度详细的鸟眼视线图，它们捕获了场景的两个静态元素，例如道路布局以及其他汽车和行人等动态元素。即时生成这些地图表示形式是一个复杂的多阶段过程，其中包含许多基于视觉的元素，包括地面平面估计，道路分割和3D对象检测。在这项工作中，我们提出了一种简单的，统一的方法，可以使用单一端到端的深度学习体系结构直接从单眼图像估算地图。对于地图本身，我们采用语义贝叶斯占用网格框架，使我们能够在多个摄像机和时间段上琐碎地积累信息。我们通过评估对Nuscenes和Argoverse数据集的几个具有挑战性的基线来证明我们的方法的有效性，并表明我们能够与表现最佳的现有方法相比，分别取得相对改善的9.1％和22.3％。

Autonomous vehicles commonly rely on highly detailed birds-eye-view maps of their environment, which capture both static elements of the scene such as road layout as well as dynamic elements such as other cars and pedestrians. Generating these map representations on the fly is a complex multi-stage process which incorporates many important vision-based elements, including ground plane estimation, road segmentation and 3D object detection. In this work we present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture. For the maps themselves we adopt a semantic Bayesian occupancy grid framework, allowing us to trivially accumulate information over multiple cameras and timesteps. We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets, and show that we are able to achieve a relative improvement of 9.1% and 22.3% respectively compared to the best-performing existing method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题