谨慎的强化学习具有逻辑上的约束

论文标题

谨慎的强化学习具有逻辑上的约束

Cautious Reinforcement Learning with Logical Constraints

论文作者

Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel

论文摘要

本文介绍了自适应安全填充的概念，该填充物迫使加固学习（RL）综合最佳控制政策，同时确保在学习过程中的安全性。合成策略以满足目标，以最大概率表示为时间逻辑公式。在学习过程中执行RL代理保持安全可能会限制探索，但是我们表明，所提出的架构能够自动处理有效进度（目标满意度）和确保安全之间的权衡。理论保证可在合成策略的最优性以及学习算法的融合方面获得。提供实验结果以展示所提出的方法的性能。

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies are synthesised to satisfy a goal, expressed as a temporal logic formula, with maximal probability. Enforcing the RL agent to stay safe during learning might limit the exploration, however we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration (towards goal satisfaction) and ensuring safety. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm. Experimental results are provided to showcase the performance of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题