3D人体运动预测的时空变压器

论文标题

3D人体运动预测的时空变压器

A Spatio-temporal Transformer for 3D Human Motion Prediction

论文作者

Aksan, Emre, Kaufmann, Manuel, Cao, Peng, Hilliges, Otmar

论文摘要

我们为3D人类运动的生成建模任务提出了一种新型的基于变压器的结构。以前的工作通常依赖于基于RNN的模型，即较短的预测范围即迅速达到固定且常常令人难以置信的状态。最近的研究表明，频域中的隐式时间表示也有效地预测了预定的地平线。我们的重点是学习时空表征自动恢复，因此在短期和长期内都产生了合理的未来发展。提出的模型学习了骨骼关节的高维嵌入以及如何通过脱钩的时间和空间自我发场机制来组成时间连贯的姿势。我们的双重注意概念使模型可以直接访问当前和过去的信息，并明确捕获结构和时间依赖性。我们从经验上表明，这有效地学习了潜在的运动动力学，并减少了自动回归模型中观察到的误差的积累。我们的模型能够进行准确的短期预测并在远距离上产生合理的运动序列。我们在https://github.com/eth-ait/motion-transformer上公开提供代码。

We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons. We make our code publicly available at https://github.com/eth-ait/motion-transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题