从周围的单眼图像中估算鸟类眼光中的外观和占用信息

论文标题

从周围的单眼图像中估算鸟类眼光中的外观和占用信息

Estimation of Appearance and Occupancy Information in Birds Eye View from Surround Monocular Images

论文作者

Sharma, Sarthak, Nair, Unnikrishnan R., Parihar, Udit Singh, S, Midhun Menon, Vidapanakal, Srikanth

论文摘要

自动驾驶需要有效地推理场景中不同代理的位置和外观，这有助于下游任务，例如对象检测，对象跟踪和路径计划。在过去的几年中，这种方法激增了，将经典自动驾驶堆栈的不同任务模块结合到了端到（E2E）可训练的学习系统中。这些方法用单个连续的模块代替了感知，预测和传感器融合模块，并用共享的潜在空间嵌入，从中提取了场景的人类解剖表示。最受欢迎的表示之一是鸟眼景（BEV），它从自上而下的视图中表达了自我车辆框架中不同交通参与者的位置。但是，BEV不会捕获参与者的色彩外观信息。为了克服这一局限性，我们提出了一种新颖的表示，该表示从涵盖360度视野（FOV）的一系列单眼相机（FOV）中捕获各种交通参与者的外观和占用信息。我们使用所有相机图像的嵌入图像来生成场景的BEV，以捕获场景的外观和占用率，这可以帮助执行下游任务，例如对象跟踪和执行基于语言的命令。我们测试了我们方法对由Carla产生的合成数据集的功效。代码，数据集和结果可以在https://rebrand.ly/app occ-results上找到。

Autonomous driving requires efficient reasoning about the location and appearance of the different agents in the scene, which aids in downstream tasks such as object detection, object tracking, and path planning. The past few years have witnessed a surge in approaches that combine the different taskbased modules of the classic self-driving stack into an End-toEnd(E2E) trainable learning system. These approaches replace perception, prediction, and sensor fusion modules with a single contiguous module with shared latent space embedding, from which one extracts a human-interpretable representation of the scene. One of the most popular representations is the Birds-eye View (BEV), which expresses the location of different traffic participants in the ego vehicle frame from a top-down view. However, a BEV does not capture the chromatic appearance information of the participants. To overcome this limitation, we propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV). We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene, which can aid in downstream tasks such as object tracking and executing language-based commands. We test the efficacy of our approach on synthetic dataset generated from CARLA. The code, data set, and results can be found at https://rebrand.ly/APP OCC-results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题