论文标题
重新审视各种内在控制
Variational Intrinsic Control Revisited
论文作者
论文摘要
在本文中,我们重新访问了跨内在控制(VIC),这是一种无监督的强化学习方法,用于找到代理商可用的最大固有选项。在Gregor等人的原始作品中。 (2016年),提出了两种VIC算法:一个明确表示选项,另一个是隐式执行的。我们表明,后者中使用的内在奖励在随机环境中会偏向偏置,从而导致逆向溶液的收敛。为了纠正这种行为并实现最大授权,我们基于过渡概率模型和高斯混合模型分别提出了两种方法。我们通过严格的数学推导和实验分析来证实我们的主张。
In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions. To correct this behavior and achieve the maximal empowerment, we propose two methods respectively based on the transitional probability model and Gaussian mixture model. We substantiate our claims through rigorous mathematical derivations and experimental analyses.