论文标题
利用质量和多样性的学习代理的分布估计器
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
论文作者
论文摘要
质量多样性(QD)是神经进化中的概念,其中一些有趣的应用程序用于增强学习。它促进了学习代理商的学习,与其他成员相比,每个成员都被优化以同时积累高任务返回和表现出行为多样性。在本文中,我们基于一种基于内核的方法,用于培训Stein变分梯度下降的QD策略合奏。由于策略的固定分布之间基于$ f $ divergence的内核,我们将问题转换为有效估计这些固定分布比率的问题。然后,我们研究了先前用于非政策评估和模仿的各种分布比估计量,并重新将它们重新定为计算整体中的策略梯度,以使所得人口多样化并且具有高质量。
Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.