MSTR：用于端到端人对象互动检测的多尺度变压器

论文标题

MSTR：用于端到端人对象互动检测的多尺度变压器

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

论文作者

Kim, Bumsoo, Mun, Jonghwan, On, Kyoung-Woon, Shin, Minchul, Lee, Junhyun, Kim, Eun-Sol

论文摘要

人类对象相互作用（HOI）检测是从图像中识别一组<人，对象，相互作用>三重态的任务。最近的工作提出了拟议的变压器编码器编码器体系结构，该体系结构成功地消除了通过端到端培训中HOI检测中许多手工设计的组件的需求。但是，它们仅限于单尺度特征分辨率，在包含人，对象的场景及其相互作用的场景中提供了次优性能。为了解决这个问题，我们提出了一个多尺度变压器（MSTR），用于HOI检测，该检测由两个新型的HOI Awaena Awea Awaenable Ableable Aformable注意力模块，称为双重性关注和实体条件条件的上下文关注。尽管现有可变形的注意力在HOI检测性能方面具有巨大的成本，但我们提出的MSTR的注意力模块学会有效地参与对识别互动至关重要的采样点。在实验中，我们在两个HOI检测基准上实现了新的最新性能。

Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal performance in scenes containing humans, objects and their interactions with vastly different scales and distances. To tackle this problem, we propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules called Dual-Entity attention and Entity-conditioned Context attention. While existing deformable attention comes at a huge cost in HOI detection performance, our proposed attention modules of MSTR learn to effectively attend to sampling points that are essential to identify interactions. In experiments, we achieve the new state-of-the-art performance on two HOI detection benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题