OPA-3D：单眼3D对象检测的咬合 - 感知像素聚集

论文标题

OPA-3D：单眼3D对象检测的咬合 - 感知像素聚集

OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

论文作者

Su, Yongzhi, Di, Yan, Manhardt, Fabian, Zhai, Guangyao, Rambach, Jason, Busam, Benjamin, Stricker, Didier, Tombari, Federico

论文摘要

尽管由于使用了预训练的深度估计量来进行伪驱动恢复，但最近已经进行了巨大的飞跃，但这种两阶段的方法通常遭受了过度拟合，并且无法明确封装深度和对象边界框之间的几何关系。为了克服这一局限性，我们提出了OPA-3D，这是一个单级，端到端，遮挡意识到的像素的聚合网络，以共同估算具有深度框架残差和对象边界盒的密集场景深度，从而可以对3D对象进行两式检测，从而实现了更加可靠的检测。因此，几何流表示为几何流，结合了可见的深度和深度框框残差，以通过显式闭塞 - 意识到的优化恢复对象边界框。此外，采用基于边界框的几何投影方案来增强距离感知。第二个流（称为上下文流）直接回归3D对象位置和大小。这种新颖的两流表示形式进一步使我们能够强制执行跨流的一致性项，从而使这两种流的输出保持一致，从而提高了整体性能。公共基准的广泛实验表明，OPA-3D在主汽车类别上的最先进方法，同时保持实时推理速度。我们计划尽快发布所有代码和训练的模型。

Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题