论文标题
改进的POMDP树木搜索计划,优先级动作分支
Improved POMDP Tree Search Planning with Prioritized Action Branching
论文作者
论文摘要
可观察到的马尔可夫决策过程的在线求解器很难扩展到大型动作空间的问题。本文提出了一种称为PA-POMCPOW的方法,用于采样动作空间的子集,该子集提供了不同的剥削和探索混合物,以包含在搜索树中。提出的方法首先根据分数函数评估动作空间,该分数函数是预期奖励和预期信息增益的线性组合。然后在树扩展过程中将其最高分数的动作添加到搜索树中。实验表明,PA-POMCPOW能够在较大的离散动作空间的问题上胜过现有的最新求解器。
Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.