论文标题
Maviper:学习可解释的多代理增强学习的决策树政策
MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning
论文作者
论文摘要
多项式增强学习(MARL)最近的许多突破都需要使用深层神经网络,这对于人类专家来说是具有挑战性的解释和理解。另一方面,现有的关于可解释的增强学习(RL)的工作在从神经网络中提取更可解释的基于决策的基于决策的策略方面显示了有希望,但仅在单一机构设置中。为了填补这一空白,我们提出了第一组算法,这些算法从接受MARL训练的神经网络中提取可解释的决策策略。第一种算法IVIPER将Viper扩展到了单代代理可解释的RL的最新方法到多代理设置。我们证明,艾维尔(Iviper)学习每个代理商的高质量决策树政策。为了更好地捕捉代理之间的协调,我们提出了一种新型的集中决策树培训算法,Maviper。 Maviper通过使用预期的树来预测其他代理的行为,共同生长每个代理的树木,并使用重新采样来专注于对其与其他代理的相互作用至关重要的状态。我们表明,这两种算法通常都优于基础线,而受Maviper训练的药物比在三个不同的多区域粒子世界环境中获得的iviper训练的药物获得了更好的协调性能。
Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.