论文标题
Rényi国家熵用于勘探加速度的加速学习
Rényi State Entropy for Exploration Acceleration in Reinforcement Learning
论文作者
论文摘要
深度强化学习中最关键的挑战之一是保持代理的长期探索能力。为了解决这个问题,最近有人提议为代理商鼓励探索提供内在的奖励。但是,文献中提出的大多数现有基于奖励的方法都无法提供可持续的探索激励措施,这个问题称为消失的回报。此外,这些常规方法在其学习过程中产生了复杂模型和其他记忆,从而导致了较高的计算复杂性和低鲁棒性。在这项工作中,提出了一个基于Rényi熵的新型固有奖励模块,以提供高质量的内在奖励。结果表明,所提出的方法实际上概括了现有的状态熵最大化方法。特别是,引入了$ k $ neart的邻居估计器进行熵估计,而$ k $ - 价值搜索方法旨在保证估计准确性。广泛的仿真结果表明,与现有方案相比,提出的基于Rényi熵的方法可以实现更高的性能。
One of the most critical challenges in deep reinforcement learning is to maintain the long-term exploration capability of the agent. To tackle this problem, it has been recently proposed to provide intrinsic rewards for the agent to encourage exploration. However, most existing intrinsic reward-based methods proposed in the literature fail to provide sustainable exploration incentives, a problem known as vanishing rewards. In addition, these conventional methods incur complex models and additional memory in their learning procedures, resulting in high computational complexity and low robustness. In this work, a novel intrinsic reward module based on the Rényi entropy is proposed to provide high-quality intrinsic rewards. It is shown that the proposed method actually generalizes the existing state entropy maximization methods. In particular, a $k$-nearest neighbor estimator is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy. Extensive simulation results demonstrate that the proposed Rényi entropy-based method can achieve higher performance as compared to existing schemes.