RTM3D：从对象关键点检测实时单程3D检测，用于自动驾驶

论文标题

RTM3D：从对象关键点检测实时单程3D检测，用于自动驾驶

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

论文作者

Li, Peixuan, Zhao, Huaici, Liu, Pengfei, Cao, Feidao

论文摘要

在这项工作中，我们提出了一个有效，准确的单眼3D检测框架。大多数成功的3D检测器将投影限制从3D边界框中作为重要组成部分。一个2D盒的四个边缘仅提供四个约束，并且性能在二维检测器的小误差时急剧恶化。与这些方法不同，我们的方法可以预测图像空间中一个3D边界框的九个透视点关键点，然后利用3D和2D透视图的几何关系来恢复3D空间中的维度，位置和方向。在这种方法中，即使关键点的估计非常嘈杂，对象的属性也可以稳定地预测，这使我们能够使用小体系结构获得快速检测速度。训练我们的方法仅使用对象的3D属性，而无需外部网络或监督数据。我们的方法是第一个用于单眼图像3D检测的实时系统，而在Kitti基准测试中实现最先进的性能。代码将在https://github.com/banconxuan/rtm3d上发布。

In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component. Four edges of a 2D box provide only four constraints and the performance deteriorates dramatically with the small error of the 2D detector. Different from these approaches, our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space. In this method, the properties of the object can be predicted stably even when the estimation of keypoints is very noisy, which enables us to obtain fast detection speed with a small architecture. Training our method only uses the 3D properties of the object without the need for external networks or supervision data. Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark. Code will be released at https://github.com/Banconxuan/RTM3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题