通过公正的政策评估和线性功能近似的加强学习

论文标题

通过公正的政策评估和线性功能近似的加强学习

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

论文作者

Winnicki, Anna, Srikant, R.

论文摘要

我们为基于模拟的策略迭代提供了绩效保证，用于控制Markov决策过程，该过程涉及使用随机近似算法以及最新技术，这些技术对非常大的MDP有用，包括LookAhead，功能近似和梯度下降。具体而言，我们分析了两种算法。第一种算法涉及一种最小平方的方法，在这种方法中，通过最小二乘在每次迭代时最小化与特征矢量相关的新重量，第二算法涉及一个两次尺度的随机近似算法，在使用较小平方溶液的几个步骤中，在使用较低的方便溶液中使用了下一步的层次效率，请使用较小的正方形溶液，然后使用静止的算法。

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题