论文标题
通过公正的政策评估和线性功能近似的加强学习
Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
论文作者
论文摘要
我们为基于模拟的策略迭代提供了绩效保证,用于控制Markov决策过程,该过程涉及使用随机近似算法以及最新技术,这些技术对非常大的MDP有用,包括LookAhead,功能近似和梯度下降。具体而言,我们分析了两种算法。第一种算法涉及一种最小平方的方法,在这种方法中,通过最小二乘在每次迭代时最小化与特征矢量相关的新重量,第二算法涉及一个两次尺度的随机近似算法,在使用较小平方溶液的几个步骤中,在使用较低的方便溶液中使用了下一步的层次效率,请使用较小的正方形溶液,然后使用静止的算法。
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.