使用者：使用语义指导和耦合网络的深度，光流和自我运动的无监督学习

论文标题

使用者：使用语义指导和耦合网络的深度，光流和自我运动的无监督学习

USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion with Semantic Guidance and Coupled Networks

论文作者

Vertens, Johan, Burgard, Wolfram

论文摘要

在本文中，我们提出了USEGSCENE，这是一种使用卷积神经网络对立体声相机图像的深度，光流和自我感动的无监督学习的框架。我们的框架利用语义信息来改善深度和光流图的正则化，多模式融合和遮挡填充考虑动态刚性对象运动作为独立的SE（3）变换。此外，我们与纯照相匹配匹配互补，我们提出了连续图像之间语义特征，像素类别和对象实例边界的匹配。与以前的方法相反，我们提出了一个网络体系结构，该网络体系结构使用共享编码共同预测所有输出，并允许在任务域上传递信息，例如，光流的预测可以从深度的预测中受益。此外，我们明确地了解网络内部的深度和光学闭塞图，这些图被利用，以改善这些区域的预测。我们在受欢迎的Kitti数据集上介绍了结果，并表明我们的方法以很大的幅度优于其他方法。

In this paper we propose USegScene, a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images using convolutional neural networks. Our framework leverages semantic information for improved regularization of depth and optical flow maps, multimodal fusion and occlusion filling considering dynamic rigid object motions as independent SE(3) transformations. Furthermore, complementary to pure photo-metric matching, we propose matching of semantic features, pixel-wise classes and object instance borders between the consecutive images. In contrast to previous methods, we propose a network architecture that jointly predicts all outputs using shared encoders and allows passing information across the task-domains, e.g., the prediction of optical flow can benefit from the prediction of the depth. Furthermore, we explicitly learn the depth and optical flow occlusion maps inside the network, which are leveraged in order to improve the predictions in therespective regions. We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题