现实世界图卷积网络（RW-GCNS）在智能视频监视中进行动作识别

论文标题

现实世界图卷积网络（RW-GCNS）在智能视频监视中进行动作识别

Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance

论文作者

Sanchez, Justin, Neff, Christopher, Tabkhi, Hamed

论文摘要

动作识别是新兴的现场智能视频监视和安全系统的关键算法部分。基于骨架的动作识别是一种有吸引力的方法，它不使用RGB像素数据，而是依靠人姿势信息来对适当的动作进行分类。但是，现有算法通常假定不代表现实世界限制的理想条件，例如嘈杂的输入，延迟需求和边缘资源限制。为了解决现有方法的局限性，本文介绍了现实世界图卷积网络（RW-GCNS），这是一种架构级别的解决方案，用于满足基于现实世界骨架的动作识别的域约束。受到人类视觉皮层的反馈连接的启发，RW-GCNS利用了现有的近乎最新的（SOTA）时空图形卷积网络（ST-GCNS）的关注反馈增强。 ST-GCN的设计选择来自以信息理论为中心的原理，以解决通常在端到端实时和边缘智能视频系统中遇到的空间和时间噪声。 Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32X less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCN通过运行10倍成本有效的NVIDIA JETSON NANO（而不是Nvidia Xavier NX），进一步显示系统的可伸缩性，同时仍在资源受限的设备上保持尊重的吞吐量范围（每秒15.6至5.5个动作）。该代码可在此处提供：https：//github.com/tecsar-uncc/rw-gcn。

Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Networks (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32X less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10X cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still maintaining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题