风险敏感的贝叶斯游戏，用于政策不确定性下的多代理增强学习

论文标题

风险敏感的贝叶斯游戏，用于政策不确定性下的多代理增强学习

Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty

论文作者

Eriksson, Hannes, Basu, Debabrota, Alibeigi, Mina, Dimitrakakis, Christos

论文摘要

在具有不完整信息的随机游戏中，由于缺乏对玩家自己的和其他玩家类型的知识，即实用程序功能和策略空间，以及不同玩家互动的固有随机性，因此引起了不确定性。在现有文献中，已经研究了随机游戏的风险，从过渡和动作的变异性诱发的固有不确定性方面进行了研究。在这项工作中，我们专注于与\ textit {类型不确定性}相关的风险。我们将其与多机构增强学习框架进行对比，在该框架中，由于其他代理商的自适应政策的不确定性，其他代理具有固定的固定策略并调查风险敏感性。我们提出了针对风险中性随机游戏提出的现有算法的风险敏感版本，例如迭代的最佳响应（IBR），虚拟游戏（FP）和使用Dual Ascent（DAPG）的一般多目标梯度方法。我们的实验分析表明，对于社交福利和通用随机游戏，对风险敏感的DAPG的性能比竞争算法更好。

In stochastic games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types, i.e. the utility function and the policy space, and also the inherent stochasticity of different players' interactions. In existing literature, the risk in stochastic games has been studied in terms of the inherent uncertainty evoked by the variability of transitions and actions. In this work, we instead focus on the risk associated with the \textit{uncertainty over types}. We contrast this with the multi-agent reinforcement learning framework where the other agents have fixed stationary policies and investigate risk-sensitiveness due to the uncertainty about the other agents' adaptive policies. We propose risk-sensitive versions of existing algorithms proposed for risk-neutral stochastic games, such as Iterated Best Response (IBR), Fictitious Play (FP) and a general multi-objective gradient approach using dual ascent (DAPG). Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum stochastic games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题