在高斯奖励的情况下，Q值的收敛

论文标题

在高斯奖励的情况下，Q值的收敛

Convergence of Q-value in case of Gaussian rewards

论文作者

Miyamoto, Konatsu, Suzuki, Masaya, Kigami, Yuma, Satake, Kodai

论文摘要

在本文中，作为对加强学习的研究，我们将Q功能融合到无界奖励（例如高斯分布）中。从中央限制定理中，在某些现实世界中，自然可以假设奖励遵循高斯分布，但是现有的证据无法保证Q功能的收敛。此外，在近年来已经流行的分销型增强学习和贝叶斯强化学习中，最好让奖励具有高斯分布。因此，在本文中，我们证明了在$ e [r（s，a）^2] <\ infty $的条件下Q功能的融合，这比现有研究更加放松。最后，作为奖励，还发布了用于分布式增强学习的政策梯度定理的证明。

In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

下载PDF全文

下载文献需遵守相关版权规定

论文标题