DPODV2：基于密集的对应关系6 DOF姿势估计

论文标题

DPODV2：基于密集的对应关系6 DOF姿势估计

DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation

论文作者

Shugurov, Ivan, Zakharov, Sergey, Ilic, Slobodan

论文摘要

我们提出了一种称为DPODV2（密集姿势对象检测器）的三个阶段6 DOF对象检测方法，该方法依赖于密集的对应关系。我们将2D对象检测器与密集的对应估计网络和多视图姿势改进方法相结合，以估计完整的6 DOF姿势。与通常仅限于单眼RGB图像的其他深度学习方法不同，我们提出了一个统一的深度学习网络，允许使用不同的成像方式（RGB或DEPTH）。此外，我们提出了一种基于可区分渲染的新型姿势改进方法。主要概念是在多个视图中比较预测并渲染对应关系，以获得与所有视图中预测的对应关系一致的姿势。我们提出的方法在受控设置中对不同的数据方式和培训数据类型进行严格评估。主要结论是，RGB在对应性估计中表现出色，而如果有良好的3D-3D对应关系，则深度有助于姿势准确性。自然，他们的组合可以实现总体最佳性能。我们进行广泛的评估和消融研究，以分析和验证几个具有挑战性的数据集的结果。 DPODV2在所有这些方面都取得了出色的效果，同时仍然保持快速，可扩展的数据独立于使用的数据模式和培训数据的类型

We propose a three-stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector) that relies on dense correspondences. We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose. Unlike other deep learning methods that are typically restricted to monocular RGB images, we propose a unified deep learning network allowing different imaging modalities to be used (RGB or Depth). Moreover, we propose a novel pose refinement method, that is based on differentiable rendering. The main concept is to compare predicted and rendered correspondences in multiple views to obtain a pose which is consistent with predicted correspondences in all views. Our proposed method is evaluated rigorously on different data modalities and types of training data in a controlled setup. The main conclusions is that RGB excels in correspondence estimation, while depth contributes to the pose accuracy if good 3D-3D correspondences are available. Naturally, their combination achieves the overall best performance. We perform an extensive evaluation and an ablation study to analyze and validate the results on several challenging datasets. DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data

下载PDF全文

下载文献需遵守相关版权规定

论文标题