用于机器人技术中混合控制的连续二异晶钢筋学习

论文标题

用于机器人技术中混合控制的连续二异晶钢筋学习

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

论文作者

Neunert, Michael, Abdolmaleki, Abbas, Wulfmeier, Markus, Lampe, Thomas, Springenberg, Jost Tobias, Hafner, Roland, Romano, Francesco, Buchli, Jonas, Heess, Nicolas, Riedmiller, Martin

论文摘要

许多现实世界中的控制问题都涉及离散决策变量（例如控制模式，齿轮开关或数字输出的选择）以及连续的决策变量，例如速度设定点，控制增益或模拟输出。但是，在定义相应的最佳控制或增强学习问题时，通常会使用完全连续或完全离散的动作空间近似。这些简化旨在将问题调整为特定算法或求解器，该算法或求解器只能支持一种类型的动作空间。另外，专家启发式方法用于从原本连续的空间中删除离散的动作。相比之下，我们建议通过使用混合增强学习来解决它们的“本地”形式的混合问题，该学习可以同时优化离散和连续的动作。在我们的实验中，我们首先证明了所提出的方法有效地解决了这种本质混合的增强学习问题。然后，我们在模拟和机器人硬件上展示了删除可能不完美的专家设计的启发式方法的好处。最后，混合增强学习鼓励我们重新考虑问题的定义。我们提出重新解决控制问题，例如通过添加元动作，改善探索或减少机械磨损。

Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear.

下载PDF全文

下载文献需遵守相关版权规定

论文标题