论文标题
预期的奖励多小型MDP中的稳态计划
Steady-State Planning in Expected Reward Multichain MDPs
论文作者
论文摘要
计划领域对决策政策的正式综合越来越感兴趣。这种形式的综合通常需要找到以某些定义明确的逻辑形式满足形式规格的策略。尽管许多这样的逻辑是在捕获理想代理行为的能力上具有不同程度的表现力和复杂性的不同程度,但在得出满足一般系统模型中某些类型的渐近行为的决策策略时,其价值受到限制。特别是,我们有兴趣指定对代理的稳态行为的约束,该行为捕获了代理商在每个状态上花费的时间比例,因为它在不确定的时间段内与环境相互作用。这有时称为代理的平均或预期行为,并且相关的计划问题面临着重大挑战,除非在其图形结构的连接性方面对基础模型施加了强大的限制。在本文中,我们探讨了这个稳态的计划问题,该问题包括为代理商得出决策政策,以使对其稳态行为的约束得到满足。提出了针对多键Markov决策过程(MDP)的一般情况的线性编程解决方案,我们证明,对拟议程序的最佳解决方案产生了具有严格的行为保证的平稳政策。
The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in general system models. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent and the associated planning problem is faced with significant challenges unless strong restrictions are imposed on the underlying model in terms of the connectivity of its graph structure. In this paper, we explore this steady-state planning problem that consists of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.