了解和减轻优先经验重播的局限

论文标题

了解和减轻优先经验重播的局限

Understanding and Mitigating the Limitations of Prioritized Experience Replay

论文作者

Pan, Yangchen, Mei, Jincheng, Farahmand, Amir-massoud, White, Martha, Yao, Hengshuai, Rohani, Mohsen, Luo, Jun

论文摘要

优先经验重播（ER）已被经验证明可以提高许多领域的样本效率，并引起了极大的关注。但是，对于为什么这种优先采样的帮助及其局限性的理论几乎没有理论上的理解。在这项工作中，我们深入研究了优先的ER。在监督的学习环境中，我们显示了基于错误的优先采样方法的平方误差和均匀抽样方法之间的等效性，以造成立方功率损失。然后，我们提供理论上的见解，说明为什么在早期学习过程中提高均匀抽样时的收敛速度。根据洞察力，我们进一步指出了优先ER方法的两个局限性：1）过时的优先级和2）样品空间的覆盖范围不足。为了减轻局限性，我们提出了基于模型的随机梯度Langevin动力学采样方法。我们表明，我们的方法确实提供了通过蛮力方法估计的理想优先采样分布分布的状态，该分布没有两个局限性。我们对离散和连续控制问题进行实验，以显示我们的方法的功效，并检查我们方法在自主驾驶应用中的实际含义。

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations. In this work, we take a deep look at the prioritized ER. In a supervised learning setting, we show the equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss. We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning. Based on the insight, we further point out two limitations of the prioritized ER method: 1) outdated priorities and 2) insufficient coverage of the sample space. To mitigate the limitations, we propose our model-based stochastic gradient Langevin dynamics sampling method. We show that our method does provide states distributed close to an ideal prioritized sampling distribution estimated by the brute-force method, which does not suffer from the two limitations. We conduct experiments on both discrete and continuous control problems to show our approach's efficacy and examine the practical implication of our method in an autonomous driving application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题