论文标题

可证明使用类比的PAC-MDP探索

Provably Safe PAC-MDP Exploration Using Analogies

论文作者

Roderick, Melrose, Nagarajan, Vaishnavh, Kolter, J. Zico

论文摘要

将强化学习应用于安全关键领域的关键挑战是了解如何平衡探索(需要在任务上实现良好的绩效)与安全性(需要避免灾难性故障)。尽管越来越多的加固学习工作已经调查了“安全探索”该领域,但大多数现有技术要么1)不能保证在实际勘探过程中的安全;和/或2)将问题限制在具有强烈平稳性假设的先验已知和/或确定性过渡动力学上。在解决这一差距时,我们提出了类似的安全状态探索(ASE),这是一种在具有未知的随机动力学的MDP中可证明安全探索的算法。我们的方法利用了国家行动对之间的类比,以在PAC-MDP的意义上安全地学习近乎最佳的策略。此外,与现有方法相比,ASE还指导探索最重要的国家,从经验上可以在样本效率方面取得显着提高。

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源