论文标题
在连续POMDP中进行任务指导的探索,用于机器人操纵铰接物体
Task-Directed Exploration in Continuous POMDPs for Robotic Manipulation of Articulated Objects
论文作者
论文摘要
关于不确定性的代表和推理对于在具有嘈杂传感器的部分可观察到的环境中起作用至关重要。部分可观察到的马尔可夫决策过程(POMDP)是表示不确定性是重要因素的问题的一般框架。基于在线样本的POMDP方法已成为解决大型POMDP的有效方法,并已被证明扩展到连续域。但是,这些解决方案努力在严重不确定性的问题上找到长马计划。探索启发式方法可以帮助指导计划,但是许多现实世界中的设置都包含重要的任务 - 可能的不确定性,可能会分散任务目标的注意力。在本文中,我们提出了一种在线POMDP求解器的strug,该求解器能够处理需要长期计划的域,并具有重大任务和任务无关紧要的不确定性。我们使用神经感知前端来构建可能模型的分布,以对玩具POMDP问题的几个时间扩展版本以及对铰接式对象进行机器人操纵的解决方案。我们的结果表明,在几个任务上,努力胜过当前基于样本的在线POMDP求解器。
Representing and reasoning about uncertainty is crucial for autonomous agents acting in partially observable environments with noisy sensors. Partially observable Markov decision processes (POMDPs) serve as a general framework for representing problems in which uncertainty is an important factor. Online sample-based POMDP methods have emerged as efficient approaches to solving large POMDPs and have been shown to extend to continuous domains. However, these solutions struggle to find long-horizon plans in problems with significant uncertainty. Exploration heuristics can help guide planning, but many real-world settings contain significant task-irrelevant uncertainty that might distract from the task objective. In this paper, we propose STRUG, an online POMDP solver capable of handling domains that require long-horizon planning with significant task-relevant and task-irrelevant uncertainty. We demonstrate our solution on several temporally extended versions of toy POMDP problems as well as robotic manipulation of articulated objects using a neural perception frontend to construct a distribution of possible models. Our results show that STRUG outperforms the current sample-based online POMDP solvers on several tasks.