论文标题
通过自适应交叉形式学习的基于骨架的动作识别
Skeleton-based Action Recognition via Adaptive Cross-Form Learning
论文作者
论文摘要
基于骨架的动作识别旨在将骨骼序列投影到动作类别,其中骨骼序列源自多种形式的预测点。与较早的方法相比,该方法专注于通过图形卷积网络(GCN)探索单一形式的骨架,现有方法倾向于通过利用多形骨架由于其互补提示来改善GCN。但是,这些方法(GCN的适应结构或模型集合的适应结构)都需要在训练和推理阶段之间共存所有形式的骨骼,而现实生活中的典型情况是仅存在推理的部分形式。为了解决这个问题,我们提出了自适应的交叉形式学习(ACFL),该学习使精心设计的GCN能够从单一骨架中生成互补表示,而无需改变模型容量。具体而言,ACFL中的每个GCN模型不仅从单一形式的骨架中学习动作表示,而且还可以自适应地模拟从其他形式的骨骼中得出的有用表示。这样,每个GCN都可以学习如何增强所学的知识,从而利用模型潜力并促进行动识别。在三个具有挑战性的基准上进行的广泛实验,即NTU-RGB+D 120,NTU-RGB+D 60和UAV-Human,证明了该方法的有效性和普遍性。具体而言,ACFL显着改善了各种GCN模型(即CTR-GCN,MS-G3D和Shift-GCN),从而获得了基于骨架的动作识别的新记录。
Skeleton-based action recognition aims to project skeleton sequences to action categories, where skeleton sequences are derived from multiple forms of pre-detected points. Compared with earlier methods that focus on exploring single-form skeletons via Graph Convolutional Networks (GCNs), existing methods tend to improve GCNs by leveraging multi-form skeletons due to their complementary cues. However, these methods (either adapting structure of GCNs or model ensemble) require the co-existence of all forms of skeletons during both training and inference stages, while a typical situation in real life is the existence of only partial forms for inference. To tackle this issue, we present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons without changing model capacity. Specifically, each GCN model in ACFL not only learns action representation from the single-form skeletons, but also adaptively mimics useful representations derived from other forms of skeletons. In this way, each GCN can learn how to strengthen what has been learned, thus exploiting model potential and facilitating action recognition as well. Extensive experiments conducted on three challenging benchmarks, i.e., NTU-RGB+D 120, NTU-RGB+D 60 and UAV-Human, demonstrate the effectiveness and generalizability of the proposed method. Specifically, the ACFL significantly improves various GCN models (i.e., CTR-GCN, MS-G3D, and Shift-GCN), achieving a new record for skeleton-based action recognition.