论文标题

可证明有效的基于因果模型的基于因果模型的增强学习用于系统概括

Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

论文作者

Mutti, Mirco, De Santi, Riccardo, Rossi, Emanuele, Calderon, Juan Felipe, Bronstein, Michael, Restelli, Marcello

论文摘要

在顺序决策设置中,代理旨在在大型(可能是无限的环境集)上实现系统的概括。此类环境被建模为具有特征向量表示的状态和动作的离散马尔可夫决策过程。环境的基础结构使过渡动力学分为两个组成部分:一个是特定于环境的,另一个共享的组成部分。以一组共享运动定律为例的环境。在这种情况下,代理可以从这些环境的子集中进行有限的无奖励互动。然后,代理必须能够大致解决原始集合中任何环境中定义的任何计划任务,仅依靠上述交互。我们可以设计一种实现这种雄心勃勃的系统概括目标的可证明有效的算法吗?在本文中,我们对这个问题给出了部分积极的答案。首先,我们通过采用因果观点来提供可系统性概括的公告。然后,在特定的结构假设下,我们提供了一种简单的学习算法,该算法可以保证任何所需的计划误差,直至不可避免的亚次数术语,同时展示了多项式样本的复杂性。

In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源