论文标题
通过深入的强化学习启发式方法来解决协作DEC-POMDP
Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics
论文作者
论文摘要
WQMIX,QMIX,QTRAN和VDN是DEC-POMDP的SOTA算法。他们所有人都无法解决复杂代理的合作领域。我们提供了一种解决此类问题的算法。在第一阶段,我们解决了单一代理问题并获得政策。在第二阶段,我们解决了单一代理策略的多代理问题。 SA2MA比复杂代理商合作领域的所有竞争对手都具有明显的优势。
WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.