对象目标导航的对象内存变压器

论文标题

对象目标导航的对象内存变压器

Object Memory Transformer for Object Goal Navigation

论文作者

Fukushima, Rui, Ota, Kei, Kanezaki, Asako, Sasaki, Yoko, Yoshiyasu, Yusuke

论文摘要

本文提出了一种用于对象目标导航（OBJNAV）的增强学习方法，其中代理在3D室内环境中导航以基于对象和场景的长期观察到目标对象。为此，我们提出了由两个关键想法组成的对象内存变压器（OMT）：1）对象放景内存（OSM），使它们能够存储长期场景和对象语义，以及2）变压器，这些变压器以先前观察到的场景和OSM中存储的场景和对象的顺序访问了显着对象。这种机制使代理可以在室内环境中有效地导航，而无需事先了解环境，例如拓扑图或3D网格。据我们所知，这是第一部在目标导向的导航任务中使用对象语义的长期记忆的第一项工作。在AI2-THOR数据集上进行的实验结果表明，OMT在未知环境中导航的先前方法优于先前的方法。特别是，我们表明，利用长期对象语义信息可以提高导航的效率。

This paper presents a reinforcement learning method for object goal navigation (ObjNav) where an agent navigates in 3D indoor environments to reach a target object based on long-term observations of objects and scenes. To this end, we propose Object Memory Transformer (OMT) that consists of two key ideas: 1) Object-Scene Memory (OSM) that enables to store long-term scenes and object semantics, and 2) Transformer that attends to salient objects in the sequence of previously observed scenes and objects stored in OSM. This mechanism allows the agent to efficiently navigate in the indoor environment without prior knowledge about the environments, such as topological maps or 3D meshes. To the best of our knowledge, this is the first work that uses a long-term memory of object semantics in a goal-oriented navigation task. Experimental results conducted on the AI2-THOR dataset show that OMT outperforms previous approaches in navigating in unknown environments. In particular, we show that utilizing the long-term object semantics information improves the efficiency of navigation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题