论文标题
马尔可夫决策过程具有一般状态和行动空间的原始双重回归方法
Primal-dual regression approach for Markov decision processes with general state and action space
论文作者
论文摘要
我们开发了一种基于回归的原始二键式方法,用于解决具有一般状态和动作空间的有限时间范围MDP。结果,我们的方法允许构建值函数的紧密上下偏置近似值,并为最佳策略提供了紧密的近似值。 In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizon, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.
We develop a regression based primal-dual martingale approach for solving finite time horizon MDPs with general state and action space. As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy. In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizon, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.