论文标题

通过公正的政策评估和线性功能近似的加强学习

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

论文作者

Winnicki, Anna, Srikant, R.

论文摘要

我们为基于模拟的策略迭代提供了绩效保证,用于控制Markov决策过程,该过程涉及使用随机近似算法以及最新技术,这些技术对非常大的MDP有用,包括LookAhead,功能近似和梯度下降。具体而言,我们分析了两种算法。第一种算法涉及一种最小平方的方法,在这种方法中,通过最小二乘在每次迭代时最小化与特征矢量相关的新重量,第二算法涉及一个两次尺度的随机近似算法,在使用较小平方溶液的几个步骤中,在使用较低的方便溶液中使用了下一步的层次效率,请使用较小的正方形溶液,然后使用静止的算法。

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源