论文标题
通用RCNN:通过可转移图R-CNN的通用对象检测器
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
论文作者
论文摘要
主体检测方法分别处理每个数据集并符合特定域,该域不能在没有大量重新训练的情况下适应其他域。在本文中,我们解决了设计一个通用对象检测模型的问题,该模型从多个域中利用各种类别的粒度并预测一个系统中的各种类别。现有作品通过在一个共享骨干网络上集成多个检测分支来处理此问题。但是,这种范式忽略了多个领域之间的关键语义相关性,例如类别层次结构,视觉相似性和语言关系。为了解决这些缺点,我们提出了一个新颖的通用对象检测器,称为Universal-RCNN,该对象检测器结合了图传输学习,以在多个数据集中传播相关的语义信息以达到语义相干性。具体来说,我们首先通过整合所有类别的所有高级语义表示来生成全球语义池。然后,一个内域推理模块学习并传播一个由空间感知GCN引导的数据集中的稀疏图表示。最后,提出了一个域间传输模块来利用所有域之间的多种传输依赖性,并通过在全球范围内参加和转移语义环境来增强区域特征表示。广泛的实验表明,所提出的方法显着胜过多支分支模型,并在多个对象检测基准基准上实现了最新的结果(地图:可可的49.1%)。
The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).