强大的约束强化学习以连续控制使用模型错误指定

论文标题

强大的约束强化学习以连续控制使用模型错误指定

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

论文作者

Mankowitz, Daniel J., Calian, Dan A., Jeong, Rae, Paduraru, Cosmin, Heess, Nicolas, Dathathri, Sumanth, Riedmiller, Martin, Mann, Timothy

论文摘要

需要许多实际的物理控制系统来满足部署时的限制。此外，实际系统通常会受到非平稳性，磨损，未校准传感器等的影响。这种效果有效地扰动系统动力学，并可能导致在一个领域成功训练的策略，从而部署到同一域的扰动版本时性能差。这可能会影响政策最大化未来奖励的能力，以及其满足约束的程度。我们将其称为受限的模型错误指定。我们提出了一种减轻这种错误指定形式的算法，并在现实世界增强学习（RWRL）套件的多个模拟Mujoco任务中展示其性能。

Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain. This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints. We refer to this as constrained model misspecification. We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL) suite.

下载PDF全文

下载文献需遵守相关版权规定

论文标题