论文标题

在高斯奖励的情况下,Q值的收敛

Convergence of Q-value in case of Gaussian rewards

论文作者

Miyamoto, Konatsu, Suzuki, Masaya, Kigami, Yuma, Satake, Kodai

论文摘要

在本文中,作为对加强学习的研究,我们将Q功能融合到无界奖励(例如高斯分布)中。从中央限制定理中,在某些现实世界中,自然可以假设奖励遵循高斯分布,但是现有的证据无法保证Q功能的收敛。此外,在近年来已经流行的分销型增强学习和贝叶斯强化学习中,最好让奖励具有高斯分布。因此,在本文中,我们证明了在$ e [r(s,a)^2] <\ infty $的条件下Q功能的融合,这比现有研究更加放松。最后,作为奖励,还发布了用于分布式增强学习的政策梯度定理的证明。

In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源