深度宾厄姆网络：处理姿势估计的不确定性和歧义

论文标题

深度宾厄姆网络：处理姿势估计的不确定性和歧义

Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation

论文作者

Deng, Haowen, Bui, Mai, Navab, Nassir, Guibas, Leonidas, Ilic, Slobodan, Birdal, Tolga

论文摘要

在这项工作中，我们介绍了Deep Bingham Networks（DBN），这是一个通用框架，可以自然处理与姿势相关的不确定性和模棱两可，这些框架几乎在所有有关3D数据的现实生活应用中都会产生。尽管现有作品努力找到解决姿势估计问题的一种解决方案，但我们与歧义造成了高度不确定性的和平，围绕哪些解决方案确定为最好的解决方案。取而代之的是，我们报告了一个捕捉解决方案空间本质的姿势家族。 DBN通过（i）一个多用途预测头扩展了艺术直接姿势回归网络的状态，该预测头可以产生不同的分布模式；（ii）受益于旋转的宾厄姆分布的新型损失功能。这样，DBN可以在提供不确定性信息的明确案例中起作用，以及在需要每种模式不确定性的模棱两可的场景中。在技术方面，我们的网络会回归连续的宾汉混合模型，并且适用于2D数据，例如图像和3D数据，例如点云。我们提出了新的培训策略，以避免在训练过程中模式或后倒塌并提高数值稳定性。我们的方法在利用两种不同方式的两个不同应用程序上进行了彻底测试：（i）从图像中重新定位6D摄像头；（ii）物体构成3D点云的估计，证明了与艺术状态相比的优势。对于前者，我们贡献了自己的数据集，该数据集由五个室内场景组成，在捕获与难以唯一识别的视图相对应的图像相对应的图像是不可避免的。对于后者，我们尤其是用于模型网络数据集的对称对象的最高结果。

In this work, we introduce Deep Bingham Networks (DBN), a generic framework that can naturally handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data. While existing works strive to find a single solution to the pose estimation problem, we make peace with the ambiguities causing high uncertainty around which solutions to identify as the best. Instead, we report a family of poses which capture the nature of the solution space. DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes; and (ii) novel loss functions that benefit from Bingham distributions on rotations. This way, DBN can work both in unambiguous cases providing uncertainty information, and in ambiguous scenes where an uncertainty per mode is desired. On a technical front, our network regresses continuous Bingham mixture models and is applicable to both 2D data such as images and to 3D data such as point clouds. We proposed new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability. Our methods are thoroughly tested on two different applications exploiting two different modalities: (i) 6D camera relocalization from images; and (ii) object pose estimation from 3D point clouds, demonstrating decent advantages over the state of the art. For the former we contributed our own dataset composed of five indoor scenes where it is unavoidable to capture images corresponding to views that are hard to uniquely identify. For the latter we achieve the top results especially for symmetric objects of ModelNet dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题