MDP同态网络：增强学习中的小组对称性

论文标题

MDP同态网络：增强学习中的小组对称性

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

论文作者

van der Pol, Elise, Worrall, Daniel E., van Hoof, Herke, Oliehoek, Frans A., Welling, Max

论文摘要

本文介绍了用于深入增强学习的MDP同构网络。 MDP同态网络是神经网络，在MDP的联合状态行动空间中对称为对称性。当前的深入增强学习方法通常不会利用有关这种结构的知识。通过使用肩riance约束，通过将此先验知识构建到策略和价值网络中，我们可以减少解决方案空间的大小。我们特别关注群体结构化对称性（可逆变换）。此外，我们引入了一种简单的方法，用于以数值方式构造epurivariant网络层，因此系统设计人员不需要像通常这样做的那样手动解决约束。我们构建MDP同态MLP和CNN，这些MLP和CNN在一组反射或旋转下是均等的。我们表明，此类网络收敛的速度比Cartpole，网格世界和乒乓球上的非结构化基线更快。

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.

下载PDF全文

下载文献需遵守相关版权规定

论文标题