论文标题

马尔可夫决策过程具有一般状态和行动空间的原始双重回归方法

Primal-dual regression approach for Markov decision processes with general state and action space

论文作者

Belomestny, Denis, Schoenmakers, John

论文摘要

我们开发了一种基于回归的原始二键式方法,用于解决具有一般状态和动作空间的有限时间范围MDP。结果,我们的方法允许构建值函数的紧密上下偏置近似值,并为最佳策略提供了紧密的近似值。 In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizo​​n, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.

We develop a regression based primal-dual martingale approach for solving finite time horizon MDPs with general state and action space. As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy. In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizon, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源