具有图形时间逻辑规格的计算MDP的策略合成

论文标题

具有图形时间逻辑规格的计算MDP的策略合成

Policy Synthesis for Factored MDPs with Graph Temporal Logic Specifications

论文作者

Cubuktepe, Murat, Xu, Zhe, Topcu, Ufuk

论文摘要

我们研究了实施空间任务的多机构系统的策略的综合。我们将问题形式化为Markov决策过程，但要遵守所谓的图形时间逻辑规范。每个代理的过渡函数和空间任务取决于代理本身及其相邻药物。模型中的结构和规格使得可以开发出分布式算法，该算法给定计算的马尔可夫决策过程和图形时间逻辑公式，将综合问题分解为一组较小的合成问题，每个代理一个。我们证明，该算法在代理总数中随时间线性运行。每个代理的合成问题的大小仅在相邻代理的数量中指数级，通常比代理的数量小得多。我们在疾病控制和城市安全的案例研究中证明了算法。数值示例表明该算法可以扩展到数百个代理。

We study the synthesis of policies for multi-agent systems to implement spatial-temporal tasks. We formalize the problem as a factored Markov decision process subject to so-called graph temporal logic specifications. The transition function and the spatial-temporal task of each agent depend on the agent itself and its neighboring agents. The structure in the model and the specifications enable to develop a distributed algorithm that, given a factored Markov decision process and a graph temporal logic formula, decomposes the synthesis problem into a set of smaller synthesis problems, one for each agent. We prove that the algorithm runs in time linear in the total number of agents. The size of the synthesis problem for each agent is exponential only in the number of neighboring agents, which is typically much smaller than the number of agents. We demonstrate the algorithm in case studies on disease control and urban security. The numerical examples show that the algorithm can scale to hundreds of agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题