论文标题
检测与Diabolo
Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO
论文作者
论文摘要
检测人类相互作用对于人类行为分析至关重要。已经提出了许多方法来处理人与身之间的相互作用(HOI)检测,即,在一个人和对象相互作用并分类相互作用类型的图像中检测。但是,通常在可用的HOI培训数据集中不考虑人与人之间的互动,例如社会和暴力互动。正如我们认为,在分析人类行为时,这些类型的相互作用不能忽略和与HOI脱节,我们提出了一个新的相互作用数据集来处理两种类型的人类相互作用:人与人之间或人类或对象(H2O)。此外,我们还引入了一种新颖的动词分类法,旨在与周围的互动目标以及更独立于环境有关的人体态度的描述。与某些现有数据集不同,我们努力避免在高度使用目标类型或需要高度的语义解释时定义同义动词。由于H2O数据集包括使用此新分类法注释的V-Coco图像,因此图像显然包含更多的交互。对于HOI检测方法的复杂性取决于人,目标或互动的数量,这可能是一个问题。因此,我们提出了diabolo(仅查看一次来检测相互作用),这是一种有效的以主题为中心的单发方法,可以在一个正向中检测所有相互作用,而恒定的推理时间与图像含量无关。此外,此多任务网络同时检测所有人和对象。我们展示了共享这些任务网络的方式,不仅可以节省计算资源,而且可以协作提高绩效。最后,Diabolo是针对H2O相互作用检测的新提出的挑战的强大基线,因为在HOI数据集V-Coco上训练和评估时,它比所有最新方法都优于所有最新方法。
Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H2O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H2O Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO.