对PointGoal导航的无模型和基于模型的学习信息计划的比较

论文标题

对PointGoal导航的无模型和基于模型的学习信息计划的比较

Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation

论文作者

Li, Yimeng, Debnath, Arnab, Stein, Gregory J., Kosecka, Jana

论文摘要

近年来，已经提出了几种在以前看不见的环境中指出目标导航的方法。它们在环境的表示，问题分解和实验评估中有所不同。在这项工作中，我们将基于最深入的深入学习方法与可观察到的马尔可夫决策过程（POMDP）的表达对点目标导航问题进行比较。我们通过[1]提出的（POMDP）亚目标框架调整（POMDP），并通过使用由图像的语义分割构建的室内场景的部分语义图来修改估计前沿属性的组件。除了众所周知的基于模型的方法的完整性外，我们还证明了它具有强大和有效的效率，因为它与基于乐观的边境计划师相比，它利用了信息丰富的领域属性。与端到端的深度强化学习方法相比，我们还证明了其数据效率。我们将结果与使用栖息地模拟器在MatterPort3D数据集上的乐观计划者，ANS和DD-PPO进行了比较。我们显示出可比性的，尽管性能比SOTA DD-PPO方法稍差，但数据却少得多。

In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We adapt the (POMDP) sub-goal framework proposed by [1] and modify the component that estimates frontier properties by using partial semantic maps of indoor scenes built from images' semantic segmentation. In addition to the well-known completeness of the model-based approach, we demonstrate that it is robust and efficient in that it leverages informative, learned properties of the frontiers compared to an optimistic frontier-based planner. We also demonstrate its data efficiency compared to the end-to-end deep reinforcement learning approaches. We compare our results against an optimistic planner, ANS and DD-PPO on Matterport3D dataset using the Habitat Simulator. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题