S3CNET：LIDAR点云的稀疏语义场景完成网络

论文标题

S3CNET：LIDAR点云的稀疏语义场景完成网络

S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds

论文作者

Cheng, Ran, Agia, Christopher, Ren, Yuan, Li, Xinhai, Bingbing, Liu

论文摘要

随着自动驾驶和类似机器人系统对鲁棒3D视觉的日益依赖，使用深度卷积神经网络的激光扫描的处理已成为学术界和行业的趋势。事先尝试进行具有挑战性的语义场景完成任务（需要推断“稀疏”表示形式的密集3D结构和相关的语义标签 - 在某种程度上，在提供的小型室内场景中，当带有密集的点云或密集的深度映射时，通常与RGB图像中的语义分割图融合在一起。但是，当应用于以动态性和指数较高的条件为特征的大型室外场景时，这些系统的性能急剧下降。同样，由于内存限制，整个稀疏体积的处理变得不可行，并且解决方法会导致计算效率低下，因为从业人员被迫将整体量分为多个相等的段，并分别分别推断出每个单独的段，从而使实时性能不可能。在这项工作中，我们制定了一种包含大规模环境的稀疏性的方法，并呈现S3CNET，这是一种基于稀疏的卷积神经网络，可预测从单个统一的LIDAR点云中从语义上完成的场景。我们表明，我们提出的方法的表现优于3D任务上的所有对应方法，从而在Semantickitti基准测试中实现了最先进的结果。此外，我们提出了S3CNET的2D变体，采用多视图融合策略来补充我们的3D网络，为遥远地区的遮挡和极端稀疏提供了鲁棒性。我们为2D语义场景完成任务进行了实验，并将稀疏2D网络的结果与适用于两个开源数据集中的鸟类视图细分的几种领先的LIDAR分割模型进行了比较。

With the increasing reliance of self-driving and similar robotic systems on robust 3D vision, the processing of LiDAR scans with deep convolutional neural networks has become a trend in academia and industry alike. Prior attempts on the challenging Semantic Scene Completion task - which entails the inference of dense 3D structure and associated semantic labels from "sparse" representations - have been, to a degree, successful in small indoor scenes when provided with dense point clouds or dense depth maps often fused with semantic segmentation maps from RGB images. However, the performance of these systems drop drastically when applied to large outdoor scenes characterized by dynamic and exponentially sparser conditions. Likewise, processing of the entire sparse volume becomes infeasible due to memory limitations and workarounds introduce computational inefficiency as practitioners are forced to divide the overall volume into multiple equal segments and infer on each individually, rendering real-time performance impossible. In this work, we formulate a method that subsumes the sparsity of large-scale environments and present S3CNet, a sparse convolution based neural network that predicts the semantically completed scene from a single, unified LiDAR point cloud. We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the SemanticKITTI benchmark. Furthermore, we propose a 2D variant of S3CNet with a multi-view fusion strategy to complement our 3D network, providing robustness to occlusions and extreme sparsity in distant regions. We conduct experiments for the 2D semantic scene completion task and compare the results of our sparse 2D network against several leading LiDAR segmentation models adapted for bird's eye view segmentation on two open-source datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题