论文标题
来自单个图像的多个人的相干重建
Coherent Reconstruction of Multiple Humans from a Single Image
论文作者
论文摘要
在这项工作中,我们从单个图像中解决了多人3D姿势估计的问题。在此问题自上而下的设置中,一种典型的回归方法将首先检测到所有人类,然后独立重建每个人。但是,这种类型的预测遭受了不连贯的结果,例如,现场人员之间的互穿和深度排序不一致。我们的目标是训练一个学会避免这些问题的单一网络,并对现场所有人类产生连贯的3D重建。为此,关键的设计选择是将SMPL参数体模型纳入我们自上而下的框架,这使得可以使用两个新型损失。首先,基于距离的野外碰撞损失惩罚了重建人中的互穿。其次,关于闭塞的深度订购损失原因,并促进了人们的深度订购,这导致渲染与带注释的实例细分一致的渲染。即使图像没有明确的3D注释,这也为网络提供了深度监督信号。实验表明,我们的方法在标准3D构成基准方面的先前方法优于以前的方法,而我们提出的损失可以在自然图像中更加连贯的重建。带有视频,结果和代码的项目网站可以在以下网址找到:https://jiangwenpl.github.io/multiperson
In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. However, this type of prediction suffers from incoherent results, e.g., interpenetration and inconsistent depth ordering between the people in the scene. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene. To this end, a key design choice is the incorporation of the SMPL parametric body model in our top-down framework, which enables the use of two novel losses. First, a distance field-based collision loss penalizes interpenetration among the reconstructed people. Second, a depth ordering-aware loss reasons about occlusions and promotes a depth ordering of people that leads to a rendering which is consistent with the annotated instance segmentation. This provides depth supervision signals to the network, even if the image has no explicit 3D annotations. The experiments show that our approach outperforms previous methods on standard 3D pose benchmarks, while our proposed losses enable more coherent reconstruction in natural images. The project website with videos, results, and code can be found at: https://jiangwenpl.github.io/multiperson