利用平均水平：RL中KL正则化的分析

论文标题

利用平均水平：RL中KL正则化的分析

Leverage the Average: an Analysis of KL Regularization in RL

论文作者

Vieillard, Nino, Kozuno, Tadashi, Scherrer, Bruno, Pietquin, Olivier, Munos, Rémi, Geist, Matthieu

论文摘要

最新的增强学习（RL）算法利用Kullback-Leibler（KL）正则化作为核心组件，已显示出出色的性能。然而，从理论上讲，关于KL正则为什么会有所帮助的原因很少。我们在近似值迭代方案中研究KL正则化，并表明它隐含地平均Q值。利用这一见识，我们提供了非常强大的性能结合，第一个结合了两个理想的方面：对地平线的线性依赖性（而不是二次）和一个错误传播项，涉及估计错误的平均效果（而不是累积效应）。我们还研究了额外的熵正规剂的更一般情况。由此产生的抽象方案涵盖了许多现有的RL算法。我们的某些假设不存在神经网络，因此我们通过广泛的经验研究对这种理论分析进行补充。

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题