在对手的存在下，目标条件的强化学习

论文标题

在对手的存在下，目标条件的强化学习

Goal-Conditioned Reinforcement Learning in the Presence of an Adversary

论文作者

Purves, Carlos, Liò, Pietro, Cangea, Cătălina

论文摘要

在过去的几年中，强化学习在现实世界中的应用增加了。但是，物理环境通常是不完善的，在模拟中表现良好的策略在其他地方应用时可能无法达到相同的性能。战斗的一种常见方法是在存在对手的情况下训练代理。对手的行动是为了破坏代理的稳定，该代理会学习更强大的政策，并可以更好地处理现实的条件。加强学习的许多真实世界应用也利用目标条件：这在机器人技术的背景下特别有用，因为它允许代理商采取不同的行动，具体取决于选择哪个目标。在这里，我们将重点放在对手存在下的目标条件学习问题上。我们首先提出了Digitflip和CleVR-Play，这是两个新型目标条件的环境，这些环境支持对抗对手。接下来，我们提出了Eher和Cher（她的两种基于目标条件学习的算法），并评估其表现。最后，我们统一了这两个线程，并引入了Igoal：在对手面前的目标条件学习的新颖框架。实验结果表明，与随机和有能力的对手作用时，将igoal与EHER相结合可以显着胜过现有的方法。

Reinforcement learning has seen increasing applications in real-world contexts over the past few years. However, physical environments are often imperfect and policies that perform well in simulation might not achieve the same performance when applied elsewhere. A common approach to combat this is to train agents in the presence of an adversary. An adversary acts to destabilise the agent, which learns a more robust policy and can better handle realistic conditions. Many real-world applications of reinforcement learning also make use of goal-conditioning: this is particularly useful in the context of robotics, as it allows the agent to act differently, depending on which goal is selected. Here, we focus on the problem of goal-conditioned learning in the presence of an adversary. We first present DigitFlip and CLEVR-Play, two novel goal-conditioned environments that support acting against an adversary. Next, we propose EHER and CHER -- two HER-based algorithms for goal-conditioned learning -- and evaluate their performance. Finally, we unify the two threads and introduce IGOAL: a novel framework for goal-conditioned learning in the presence of an adversary. Experimental results show that combining IGOAL with EHER allows agents to significantly outperform existing approaches, when acting against both random and competent adversaries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题