论文标题
人类动作的语义标签用于视力障碍和盲人场景互动
Semantic Labeling of Human Action For Visually Impaired And Blind People Scene Interaction
论文作者
论文摘要
这项工作的目的是为视觉障碍和盲人的触觉设备开发,以便让他们了解周围人的行动并与他们互动。首先,基于RGB-D序列中人类动作识别的最新方法,我们使用Kinect提供的骨骼信息,并使用分离和统一的多尺度图形卷积(MS-G3D)模型来识别执行的动作。我们在真实场景上测试了该模型,并发现了一些约束和局限性。接下来,我们在MS-G3D的骨架模式与CNN的深度模态之间应用融合,以绕过讨论的局限性。第三,公认的动作是用语义标记的,并将映射到可通过触摸感可感知的输出设备中。
The aim of this work is to contribute to the development of a tactile device for visually impaired and blind persons in order to let them to understand actions of the surrounding people and to interact with them. First, based on the state-of-the-art methods of human action recognition from RGB-D sequences, we use the skeleton information provided by Kinect, with the disentangled and unified multi-scale Graph Convolutional (MS-G3D) model to recognize the performed actions. We tested this model on real scenes and found some of constraints and limitations. Next, we apply a fusion between skeleton modality with MS-G3D and depth modality with CNN in order to bypass the discussed limitations. Third, the recognized actions are labeled semantically and will be mapped into an output device perceivable by the touch sense.