论文标题
doro:歧义体现代理的引用对象
DoRO: Disambiguation of referred object for embodied agents
论文作者
论文摘要
机器人任务指令通常涉及机器人必须在环境中定位(地面)的引用对象。尽管任务意图理解是自然语言理解的重要组成部分,但努力解决可能在扎根任务时可能出现的歧义。现有工作使用基于视觉的任务接地和歧义检测,适用于固定视图和静态机器人。但是,该问题对移动机器人进行了放大,其中理想的视图事先未知。此外,单个视图可能不足以定位给定区域中的所有对象实例,从而导致歧义检测不准确。只有在机器人可以传达其面临的歧义类型的情况下,人类干预才有帮助。在本文中,我们介绍了doro(对对象的歧义),该系统可以帮助体现的代理在需要时提出合适的查询来消除引用对象的歧义。给定预期对象所处的区域,Doro通过在探索和扫描该区域的同时从多个视图中汇总观察结果来找到对象的所有实例。然后,它使用接地对象实例中的信息提出了合适的查询。使用AI2thor模拟器进行的实验表明,Doro不仅可以更准确地检测到歧义,而且还通过视觉语言接地中的更准确的信息提出了冗长的查询。
Robotic task instructions often involve a referred object that the robot must locate (ground) within the environment. While task intent understanding is an essential part of natural language understanding, less effort is made to resolve ambiguity that may arise while grounding the task. Existing works use vision-based task grounding and ambiguity detection, suitable for a fixed view and a static robot. However, the problem magnifies for a mobile robot, where the ideal view is not known beforehand. Moreover, a single view may not be sufficient to locate all the object instances in the given area, which leads to inaccurate ambiguity detection. Human intervention is helpful only if the robot can convey the kind of ambiguity it is facing. In this article, we present DoRO (Disambiguation of Referred Object), a system that can help an embodied agent to disambiguate the referred object by raising a suitable query whenever required. Given an area where the intended object is, DoRO finds all the instances of the object by aggregating observations from multiple views while exploring & scanning the area. It then raises a suitable query using the information from the grounded object instances. Experiments conducted with the AI2Thor simulator show that DoRO not only detects the ambiguity more accurately but also raises verbose queries with more accurate information from the visual-language grounding.