通过深入的强化学习启发式方法来解决协作DEC-POMDP

论文标题

通过深入的强化学习启发式方法来解决协作DEC-POMDP

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

论文作者

Soffair, Nitsan

论文摘要

WQMIX，QMIX，QTRAN和VDN是DEC-POMDP的SOTA算法。他们所有人都无法解决复杂代理的合作领域。我们提供了一种解决此类问题的算法。在第一阶段，我们解决了单一代理问题并获得政策。在第二阶段，我们解决了单一代理策略的多代理问题。 SA2MA比复杂代理商合作领域的所有竞争对手都具有明显的优势。

WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题