在复制内核希尔伯特空间中的强化学习政策的代表

论文标题

在复制内核希尔伯特空间中的强化学习政策的代表

Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces

论文作者

Mazoure, Bogdan, Doan, Thang, Li, Tianyu, Makarenkov, Vladimir, Pineau, Joelle, Precup, Doina, Rabusseau, Guillaume

论文摘要

我们为加强学习任务的策略表示提供了一个一般框架。该框架涉及在复制的内核希尔伯特空间（RKHS）上找到该政策的低维嵌入。基于RKHS的方法的使用使我们能够在重建政策的预期返回中获得强有力的理论保证。这种保证通常在黑盒模型中缺乏，但是在需要稳定的任务中非常理想。我们对经典RL域进行了几项实验。结果证实，策略可以牢固地嵌入低维空间中，而嵌入式策略几乎不会减少回报。

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.

下载PDF全文

下载文献需遵守相关版权规定

论文标题