被忽视的姿势实际上是有道理的：将特权知识蒸馏成人类运动预测

论文标题

被忽视的姿势实际上是有道理的：将特权知识蒸馏成人类运动预测

Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction

论文作者

Sun, Xiaoning, Cui, Qiongjie, Sun, Huaijiang, Li, Bin, Li, Weiqing, Lu, Jianfeng

论文摘要

先前关于人类运动预测的著作遵循观察到的序列与要预测的序列之间建立映射关系的模式。但是，由于多元时间序列数据的固有复杂性，找到运动序列之间的外推关系仍然是一个挑战。在本文中，我们提出了一种新的预测模式，该模式介绍了以前被忽视的人类姿势，以从插值的角度实施预测任务。这些姿势在预测的序列后存在，并形成特权序列。要具体，我们首先提出了一个插值学习网络（ITP-NETWORK），该网络既编码观察到的序列和特权序列，以插入预测序列之间的内部，其中其中嵌入的特权序列 - 序列编码器（Priv-envencoder（Priv-envencoder）同时学习了自由知识（PK）。然后，我们提出了一个最终的预测网络（FP-NETWORK），该网络无法观察到特权序列，但配备了一种新型的PK模拟器，该新型PK模拟器会提取从先前网络中学到的PK。该模拟器以输入为输入序列，但近似私有编码器的行为，从而使fp-network模仿插值过程。广泛的实验结果表明，在短期和长期预测中，我们的预测模式在基准的36M，CMU-MOCAP和3DPW数据集上实现了最先进的性能。

Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted. However, due to the inherent complexity of multivariate time series data, it still remains a challenge to find the extrapolation relation between motion sequences. In this paper, we present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation. These poses exist after the predicted sequence, and form the privileged sequence. To be specific, we first propose an InTerPolation learning Network (ITP-Network) that encodes both the observed sequence and the privileged sequence to interpolate the in-between predicted sequence, wherein the embedded Privileged-sequence-Encoder (Priv-Encoder) learns the privileged knowledge (PK) simultaneously. Then, we propose a Final Prediction Network (FP-Network) for which the privileged sequence is not observable, but is equipped with a novel PK-Simulator that distills PK learned from the previous network. This simulator takes as input the observed sequence, but approximates the behavior of Priv-Encoder, enabling FP-Network to imitate the interpolation process. Extensive experimental results demonstrate that our prediction pattern achieves state-of-the-art performance on benchmarked H3.6M, CMU-Mocap and 3DPW datasets in both short-term and long-term predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题