可证明使用类比的PAC-MDP探索

论文标题

可证明使用类比的PAC-MDP探索

Provably Safe PAC-MDP Exploration Using Analogies

论文作者

Roderick, Melrose, Nagarajan, Vaishnavh, Kolter, J. Zico

论文摘要

将强化学习应用于安全关键领域的关键挑战是了解如何平衡探索（需要在任务上实现良好的绩效）与安全性（需要避免灾难性故障）。尽管越来越多的加固学习工作已经调查了“安全探索”该领域，但大多数现有技术要么1）不能保证在实际勘探过程中的安全；和/或2）将问题限制在具有强烈平稳性假设的先验已知和/或确定性过渡动力学上。在解决这一差距时，我们提出了类似的安全状态探索（ASE），这是一种在具有未知的随机动力学的MDP中可证明安全探索的算法。我们的方法利用了国家行动对之间的类比，以在PAC-MDP的意义上安全地学习近乎最佳的策略。此外，与现有方法相比，ASE还指导探索最重要的国家，从经验上可以在样本效率方面取得显着提高。

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题