论文标题
基于抽样的强化学习算法的分布分析
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
论文作者
论文摘要
我们提出了一种分布方法,用于对恒定步骤尺寸的增强学习算法的理论分析。我们通过为各种常用方法提供简单和统一的收敛证明来证明其有效性。我们表明,基于价值的方法,例如TD($λ$)和$ Q $ - 学习具有更新规则,这些规则在功能分布的空间中具有缩写,从而建立了其指数快速的收敛到固定分布。我们证明,任何目标是预期的贝尔曼更新的算法获得的固定分布的平均值等于真实值函数。此外,我们确定分布集中在其平均值上,因为步进大小的收缩。我们进一步分析了收缩属性不存在的乐观政策迭代算法,并制定了概率的策略改进属性,该策略改善属性需要算法的融合。
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($λ$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.