论文标题

部分可观测时空混沌系统的无模型预测

TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation

论文作者

Li, Pengfei, Tian, Beiwen, Shi, Yongliang, Chen, Xiaoxue, Zhao, Hao, Zhou, Guyue, Zhang, Ya-Qin

论文摘要

当前的参考表达理解算法可以有效地检测名词指示的对象,但是如何理解动词参考仍未探索。因此,我们研究了面向任务检测的具有挑战性的问题,该问题旨在找到最能负担得起动词(例如舒适地坐着)的动作的对象。迈向更好地为机器人交互之类的下游应用程序提供的精细本地化,我们将问题扩展到面向任务的实例细分中。此任务的独特要求是在可能的替代方案中选择首选的候选人。因此,我们求助于变压器体系结构,该体系结构自然地将其与注意力相结合的查询关系建模,从而导致毒理方法。为了利用预先训练的名词来参考表达理解模型,以及我们可以在训练期间访问特权名词地面真相的事实,提出了一种新颖的名词 - 释放蒸馏框架。名词原型是以无监督的方式生成的,并且对上下文代词特征进行了训练以选择原型。因此,网络在推理过程中仍然是名词不可能的。我们评估面向大规模任务的数据集可可任务任务,并获得$ 10.9%的$ \ rm {map^{box}} $比最佳报告的结果。所提出的名词 - 蒸馏可以提高$ \ rm {map^{box}} $和$ \ rm {map^{mask}} $ by +2.8%和 +3.8%。代码和模型可在https://github.com/air-discover/toist上公开获取。

Current referring expression comprehension algorithms can effectively detect or segment objects indicated by nouns, but how to understand verb reference is still under-explored. As such, we study the challenging problem of task oriented detection, which aims to find objects that best afford an action indicated by verbs like sit comfortably on. Towards a finer localization that better serves downstream applications like robot interaction, we extend the problem into task oriented instance segmentation. A unique requirement of this task is to select preferred candidates among possible alternatives. Thus we resort to the transformer architecture which naturally models pair-wise query relationships with attention, leading to the TOIST method. In order to leverage pre-trained noun referring expression comprehension models and the fact that we can access privileged noun ground truth during training, a novel noun-pronoun distillation framework is proposed. Noun prototypes are generated in an unsupervised manner and contextual pronoun features are trained to select prototypes. As such, the network remains noun-agnostic during inference. We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher $\rm{mAP^{box}}$ than the best-reported results. The proposed noun-pronoun distillation can boost $\rm{mAP^{box}}$ and $\rm{mAP^{mask}}$ by +2.8% and +3.8%. Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源