通过3D部分引导的视觉数据增强的细颗粒车辆感知

论文标题

通过3D部分引导的视觉数据增强的细颗粒车辆感知

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

论文作者

Lu, Feixiang, Liu, Zongdai, Miao, Hui, Wang, Peng, Zhang, Liangjun, Yang, Ruigang, Manocha, Dinesh, Zhou, Bin

论文摘要

整体上通过视觉感知模型理解对象及其3D可移动零件对于使自主代理与世界互动至关重要。对于自动驾驶，车辆，后备箱和引擎盖等车辆零件的动态和状态可以提供有意义的语义信息和交互状态，这对于确保自动驾驶车辆的安全至关重要。现有的视觉感知模型主要集中在粗解析上，例如对象边界框检测或姿势估计，很少解决这些情况。在本文中，我们通过解决三个关键问题来解决这个重要的自主驾驶问题。首先，为了解决数据稀缺性，我们通过在重建人车相互作用（VHI）方案之前将带有动态零件的3D车模型拟合到实际图像中的3D汽车模型来提出有效的培训数据生成过程。我们的方法是完全自动的，而没有任何人类互动，这可以在不常见的州（VUS）生成大量的车辆，以训练深层神经网络（DNNS）。其次，为了执行精细的车辆感知，我们提出了一个用于VUS解析的多任务网络，以及用于VHI解析的多流网络。第三，为了定量评估数据增强方法的有效性，我们在实际的流量方案中构建了第一个VUS数据集（例如，上/熄灭或放置/删除行李箱）。实验结果表明，我们的方法在2D检测和实例分割中的其他基线方法较大（超过8％）。此外，我们的网络在发现和理解这些不常见的情况方面会产生很大的改进。此外，我们已在GitHub（https://github.com/zongdai/editing/editingfordnn）上发布了源代码，数据集和受过训练的模型。

Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensuring the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important autonomous driving problem by solving three critical issues. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images before reconstructing human-vehicle interaction (VHI) scenarios. Our approach is fully automatic without any human interaction, which can generate a large number of vehicles in uncommon states (VUS) for training deep neural networks (DNNs). Second, to perform fine-grained vehicle perception, we present a multi-task network for VUS parsing and a multi-stream network for VHI parsing. Third, to quantitatively evaluate the effectiveness of our data augmentation approach, we build the first VUS dataset in real traffic scenarios (e.g., getting on/out or placing/removing luggage). Experimental results show that our approach advances other baseline methods in 2D detection and instance segmentation by a big margin (over 8%). In addition, our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we have released the source code, the dataset, and the trained model on Github (https://github.com/zongdai/EditingForDNN).

下载PDF全文

下载文献需遵守相关版权规定

论文标题