学习用于使用因子图的表格增强学习的多代理技能

论文标题

学习用于使用因子图的表格增强学习的多代理技能

Learning Multi-agent Skills for Tabular Reinforcement Learning using Factor Graphs

论文作者

Chen, Jiayu, Chen, Jingdi, Lan, Tian, Aggarwal, Vaneet

论文摘要

通过连接国家过渡图的Fiedler矢量提供的嵌入空间中最遥远的状态，涵盖技能（又称选项）发现是为了改善单个代理方案中的增强学习探索。但是，这些选项发现方法不能直接扩展到多代理方案，因为关节状态空间随系统中的代理数量而呈指数增长。因此，现有关于在多代理方案中采用选项的研究仍然依赖单代理选项发现，并且未直接发现可以改善代理商联合状态空间连通性的联合选项。在本文中，我们表明，确实可以直接计算代理商之间具有协作性探索性行为的多代理选项，同时仍然享受易于分解的便利。我们的关键思想是将关节状态空间近似为Kronecker图 - 单个代理的状态过渡图的Kronecker乘积，我们可以使用各个代理的过渡图的拉普拉奇谱直接估算关节状态空间的Fiedler vector。这种分解使我们能够通过鼓励代理连接对应于估计的联合Fiedler载体的最小值或最大值来有效地构建多代理联合选项。基于多代理协作任务的评估表明，在更快的探索和较高的累积奖励方面，提出的算法可以成功识别多代理选项，并显着优于先前的工作或没有选项。

Covering skill (a.k.a., option) discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. Thus, existing researches on adopting options in multi-agent scenarios still rely on single-agent option discovery and fail to directly discover the joint options that can improve the connectivity of the joint state space of agents. In this paper, we show that it is indeed possible to directly compute multi-agent options with collaborative exploratory behaviors among the agents, while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph -- the Kronecker product of individual agents' state transition graphs, based on which we can directly estimate the Fiedler vector of the joint state space using the Laplacian spectrum of individual agents' transition graphs. This decomposition enables us to efficiently construct multi-agent joint options by encouraging agents to connect the sub-goal joint states which are corresponding to the minimum or maximum values of the estimated joint Fiedler vector. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher cumulative rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题