论文标题
2D手姿势估计的非参数结构正规化机器
Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation
论文作者
论文摘要
由于严重的表达,自咬合和手的高敏度,手动姿势估计比身体姿势估计更具挑战性。当前的方法通常依靠流行的身体姿势算法,例如卷积姿势机(CPM)来学习2D关键点功能。这些算法无法充分解决手部姿势估计的独特挑战,因为它们仅基于关键点位置而而没有试图明确地模拟它们之间的结构关系。我们提出了一种新型的非参数结构正则化机器(NSRM)进行2D手姿势估计,采用级联多任务结构来共同学习手部结构和关键点表示。结构学习的指导是由合成的手蒙版表示指导,这些掩盖表示是从关键点位置直接计算的,并通过手工四肢的新型概率表示和解剖启发的掩模合成策略进一步增强。我们对两个公共数据集进行了广泛的研究-10K和CMU Panoptic Hand。实验结果表明,明确执行结构学习一致地将CPM基线模型的姿势估计精度提高了1.17%,而第二个数据集则为4.01%。实施和实验代码可在线免费获得。我们将结构学习纳入手部姿势估计的建议不需要其他培训信息,并且可以成为其他姿势估计模型的通用附加模块。
Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained solely based on keypoint positions without seeking to explicitly model structural relationship between them. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly. The structure learning is guided by synthetic hand mask representations, which are directly computed from keypoint positions, and is further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis. We conduct extensive studies on two public datasets - OneHand 10k and CMU Panoptic Hand. Experimental results demonstrate that explicitly enforcing structure learning consistently improves pose estimation accuracy of CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one. The implementation and experiment code is freely available online. Our proposal of incorporating structural learning to hand pose estimation requires no additional training information, and can be a generic add-on module to other pose estimation models.