多代理强化学习中的紧急易货行为

论文标题

多代理强化学习中的紧急易货行为

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

论文作者

Johanson, Michael Bradley, Hughes, Edward, Timbers, Finbarr, Leibo, Joel Z.

论文摘要

人工智能的进步通常源于新环境的发展，这些新环境将现实世界中的情况抽象为一种可以方便地进行研究的形式。本文基于受基本微观经济学启发的思想贡献了这样的环境。特工学会在空间复杂的世界中生产资源，彼此交易，并消费他们喜欢的资源。我们表明，新兴的生产，消费和定价行为对微观经济学的供需转移所预测的方向响应环境条件。我们还展示了代理商的新兴商品价格在太空上有所不同，反映了当地货物的丰富性。出现价格差异之后，一些代理商会发现以不同价格的地区之间运输商品的利基市场 - 一种有利可图的策略，因为他们可以在便宜的地方购买商品并在价格昂贵的地方出售它们。最后，在一系列消融实验中，我们研究了环境奖励，易货行动，代理体系结构以及消费可贸易商品的能力如何有助于或抑制这种经济行为的出现。这项工作是一项研究计划的环境发展分支的一部分，该计划旨在通过模拟社会中的多代理互动来建立类似人类的人工通用情报。通过探索基本微观经济学的基本现象需要哪些环境特征，才能自动从学习中出现，我们到达了一个与沿着多个维度的多个多代理增强学习工作中研究的环境不同的环境。例如，该模型结合了异质的口味和身体能力，并且代理人相互谈判作为一种扎根的交流形式。

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题