通过自适应减少MDPS的正则化找到近乎最佳的政策

论文标题

通过自适应减少MDPS的正则化找到近乎最佳的政策

Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

论文作者

Yang, Wenhao, Li, Xiang, Xie, Guangzeng, Zhang, Zhihua

论文摘要

正则MDP充当原始MDP的平滑版本。但是，正规MDP始终存在偏见的最佳政策。我们提出了λ的自适应减少方案，以近似原始MDP的最佳策略，而不是使系数λof的正则术语足够小。结果表明，与设置足够小的λ相比，可以降低获得ε-最佳策略的迭代复杂性。此外，还存在还原方法与直接求解原始MDP之间存在牢固的二元性连接，我们可以从中为某些算法得出更多的自适应还原方法。

Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficientλof regularized term sufficiently small, we propose an adaptive reduction scheme for λ to approximate optimal policy of the original MDP. It is shown that the iteration complexity for obtaining anε-optimal policy could be reduced in comparison with setting sufficiently smallλ. In addition, there exists strong duality connection between the reduction method and solving the original MDP directly, from which we can derive more adaptive reduction method for certain algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题