论文标题
6多FOF姿势估计的神经网格炼油厂
Neural Mesh Refiner for 6-DoF Pose Estimation
论文作者
论文摘要
我们如何有效利用2D单眼图像信息来恢复视觉对象的6D姿势(6-DOF)?深度学习已证明对鲁棒和实时的单眼姿势估计有效。通常,网络学会使用幼稚的损失函数回归6-DOF姿势。但是,由于缺乏直接回归姿势估计的几何场景理解,因此从3D对象的渲染网格与2D实例分割结果之间存在未对准,例如,边界框和掩码的预测。本文通过可区分的神经网格渲染器弥合了2D掩码生成和3D位置预测之间的差距。我们利用准确的掩码预测和较少准确的网格预测之间的覆盖层,以迭代优化直接回归的6D姿势信息,重点是翻译估计。通过利用几何形状,我们证明我们的技术可以显着提高翻译估算艰巨任务的直接回归绩效,并在北京大学/百度(北京大学)上实现最新成果 - 自主驾驶数据集和Apolloscape 3D CAR实例数据集。代码可以在\ url {https://bit.ly/2irihfu}上找到。
How can we effectively utilise the 2D monocular image information for recovering the 6D pose (6-DoF) of the visual objects? Deep learning has shown to be effective for robust and real-time monocular pose estimation. Oftentimes, the network learns to regress the 6-DoF pose using a naive loss function. However, due to a lack of geometrical scene understanding from the directly regressed pose estimation, there are misalignments between the rendered mesh from the 3D object and the 2D instance segmentation result, e.g., bounding boxes and masks prediction. This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh renderer. We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation. By leveraging geometry, we demonstrate that our technique significantly improves direct regression performance on the difficult task of translation estimation and achieve the state of the art results on Peking University/Baidu - Autonomous Driving dataset and the ApolloScape 3D Car Instance dataset. The code can be found at \url{https://bit.ly/2IRihfU}.