论文标题

具有生成模型的KL-凝集调查的RL是最佳的

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

论文作者

Kozuno, Tadashi, Yang, Wenhao, Vieillard, Nino, Kitamura, Toshinori, Tang, Yunhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Valko, Michal, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba

论文摘要

在这项工作中,我们使用生成模型来考虑和分析无模型增强学习的样本复杂性。特别是,我们分析了Geist等人的镜像下降值迭代(MDVI)。 (2019)和Vieillard等。 (2020a),它在其价值和策略更新中使用Kullback-Leibler差异和熵正则化。我们的分析表明,当$ \ varepsilon $足够小时,找到$ \ varepsilon $ - 最佳政策几乎是最小的。这是第一个理论上的结果表明,在考虑到的设置下,一种简单的无模型算法几乎可以最小值 - 最佳选择。

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $\varepsilon$-optimal policy when $\varepsilon$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源