通过混合正则化改善加强学习的概括

论文标题

通过混合正则化改善加强学习的概括

Improving Generalization in Reinforcement Learning with Mixture Regularization

论文作者

Wang, Kaixin, Kang, Bingyi, Shao, Jie, Feng, Jiashi

论文摘要

在有限的环境中训练的深度强化学习（RL）代理往往会遭受过度拟合，并且无法推广到看不见的测试环境。为了提高其普遍性，以前探索了数据增强方法（例如切口和随机卷积）以增加数据多样性。但是，我们发现这些方法仅在本地扰动观察结果，而不论训练环境如何，对增强数据多样性和概括性能的有效性有限。在这项工作中，我们引入了一种简单的方法，名为MixReg，该方法将代理训练来自不同训练环境的观察结果，并对观察插值和监督（例如相关奖励）插值施加线性约束。 MixReg更有效地增加了数据多样性，并有助于学习平滑的策略。我们通过对大型Procgen基准进行大量实验来验证其在改善概括方面的有效性。结果表明，混合物的表现优于在看不见的测试环境上的良好基准。 MixReg是简单，有效且一般的。它可以应用于基于策略和基于价值的RL算法。代码可从https://github.com/kaixin96/mixreg获得。

Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random convolution) are previously explored to increase the data diversity. However, we find these approaches only locally perturb the observations regardless of the training environments, showing limited effectiveness on enhancing the data diversity and the generalization performance. In this work, we introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision (e.g. associated reward) interpolations. Mixreg increases the data diversity more effectively and helps learn smoother policies. We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin. Mixreg is simple, effective and general. It can be applied to both policy-based and value-based RL algorithms. Code is available at https://github.com/kaixin96/mixreg .

下载PDF全文

下载文献需遵守相关版权规定

论文标题