自我监管的多模式融合变压器用于被动活动识别

论文标题

自我监管的多模式融合变压器用于被动活动识别

Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition

论文作者

Koupai, Armand K., Bocus, Mohammud J., Santos-Rodriguez, Raul, Piechocki, Robert J., McConville, Ryan

论文摘要

Wi-Fi信号的普遍性为医疗保健等领域的人类传感和活动识别提供了重要的机会。最常用的用于被动Wi-Fi传感的传感器基于被动Wi-Fi雷达（PWR）和通道状态信息（CSI）数据，但是当前系统并未有效利用通过多个传感器获得的信息来识别不同的活动。在本文中，我们探讨了多模式传感器融合的变压器体系结构的新属性。我们研究了不同的信号处理技术，以从PWR和CSI数据（例如频谱图，鳞片图和马尔可夫过渡场（MTF））中提取多个基于图像的特征。我们首先提出了Fusion Transformer，这是一种基于注意力模型和多传感器融合的模型。实验结果表明，与重新网络体系结构相比，我们的融合变压器方法可以取得竞争成果，但资源却少得多。为了进一步改善我们的模型，我们提出了一个简单有效的框架，用于多模式和多传感器自我监督学习（SSL）。自我监督的融合变压器的表现优于基准，达到95.9％的F1分数。最后，我们展示了这种方法在接受标记的培训数据的1％（2分钟）训练时，如何显着胜过其他方法，达到标记的培训数据的20％（40分钟）。

The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve our model, we propose a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL). The self-supervised Fusion Transformer outperforms the baselines, achieving a F1-score of 95.9%. Finally, we show how this approach significantly outperforms the others when trained with as little as 1% (2 minutes) of labelled training data to 20% (40 minutes) of labelled training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题