论文标题
魔术:学习在线POMDP规划的宏观actions
MAGIC: Learning Macro-Actions for Online POMDP Planning
论文作者
论文摘要
在不确定性下,部分可观察到的马尔可夫决策过程(POMDP)是机器人决策的原则性通用框架,但是当需要长期计划时,POMDP计划遭受了很高的计算复杂性。虽然暂时扩展的宏观动作有助于缩小有效的计划范围并显着提高计算效率,但我们如何获得良好的宏观动作?本文提出了宏观动作发电机 - 批判性(魔术),该发电机 - 魔术(Magic)对在线POMDP计划进行了优化的宏观动作进行离线学习。具体来说,魔术使用在线计划者的性能作为反馈来学习宏观动作发生器的端到端。在在线规划中,发电机以机器人的信念和环境环境为条件的苍蝇情况生成。我们在模拟和真正的机器人中的几项长马计划任务中评估了魔术。实验结果表明,与原始的动作和手工宏观运动相比,学到的宏观actions为在线计划表现带来了重大好处。
The partially observable Markov decision process (POMDP) is a principled general framework for robot decision making under uncertainty, but POMDP planning suffers from high computational complexity, when long-term planning is required. While temporally-extended macro-actions help to cut down the effective planning horizon and significantly improve computational efficiency, how do we acquire good macro-actions? This paper proposes Macro-Action Generator-Critic (MAGIC), which performs offline learning of macro-actions optimized for online POMDP planning. Specifically, MAGIC learns a macro-action generator end-to-end, using an online planner's performance as the feedback. During online planning, the generator generates on the fly situation-aware macro-actions conditioned on the robot's belief and the environment context. We evaluated MAGIC on several long-horizon planning tasks both in simulation and on a real robot. The experimental results show that the learned macro-actions offer significant benefits in online planning performance, compared with primitive actions and handcrafted macro-actions.