不要这样做：使用基于规则的指导更安全的增强学习

论文标题

不要这样做：使用基于规则的指导更安全的增强学习

Don't do it: Safer Reinforcement Learning With Rule-based Guidance

论文作者

Nikonova, Ekaterina, Xue, Cheng, Renz, Jochen

论文摘要

在培训期间，强化学习系统与世界互动，而无需考虑其行动的安全性。当部署到现实世界中时，这种系统可能是危险的，并对周围的环境造成伤害。通常，可以通过定义一组规则，即在任何条件下都不应违反系统，可以减轻危险情况。例如，在机器人导航中，一个安全规则是避免与周围的物体和人相撞。在这项工作中，我们根据代理与对象之间的关系定义了安全规则，并使用它们来防止加强学习系统执行潜在的有害行动。我们提出了一种新的安全Epsilon-Greedy算法，该算法使用安全规则来覆盖代理商的行动，如果它们被认为不安全。在我们的实验中，我们表明，安全的Epsilon-Greedy政策可显着提高训练过程中代理商的安全性，提高学习效率，从而提高融合得更快，并且比基本模型更好。

During training, reinforcement learning systems interact with the world without considering the safety of their actions. When deployed into the real world, such systems can be dangerous and cause harm to their surroundings. Often, dangerous situations can be mitigated by defining a set of rules that the system should not violate under any conditions. For example, in robot navigation, one safety rule would be to avoid colliding with surrounding objects and people. In this work, we define safety rules in terms of the relationships between the agent and objects and use them to prevent reinforcement learning systems from performing potentially harmful actions. We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe. In our experiments, we show that a safe epsilon-greedy policy significantly increases the safety of the agent during training, improves the learning efficiency resulting in much faster convergence, and achieves better performance than the base model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题