客串：好奇心增强大都市，用于探索性最佳政策

论文标题

客串：好奇心增强大都市，用于探索性最佳政策

CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies

论文作者

C, Simo Alami., Llorente, Fernando, Kaddah, Rim, Martino, Luca, Read, Jesse

论文摘要

强化学习引起了人们对解决最佳控制问题的工具的巨大兴趣。解决给定的问题（任务或环境）涉及融合到最佳政策。但是，可能存在多种最佳政策，它们的行为可能会有很大不同。例如，有些可能比其他的速度快，但以更大的风险为代价。我们考虑并研究最佳政策的分布。我们设计了一个好奇的大都市算法（客串），以便我们可以采样最佳策略，并使这些政策有效地采用多种行为，因为这意味着更大的覆盖范围是对不同可能的最佳政策的更大覆盖。在实验模拟中，我们表明，客串确实获得了所有解决经典控制问题的政策，甚至在充满挑战的环境中，这些策略提供了稀疏的回报。我们进一步表明，我们采样的不同政策呈现出不同的风险概况，与可解释性中有趣的实际应用相对应，这代表了学习最佳策略本身分布的第一步。

Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.

下载PDF全文

下载文献需遵守相关版权规定

论文标题