块MDP中有效的加强学习：一种无模型的表示方法

论文标题

块MDP中有效的加强学习：一种无模型的表示方法

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

论文作者

Zhang, Xuezhou, Song, Yuda, Uehara, Masatoshi, Wang, Mengdi, Agarwal, Alekh, Sun, Wen

论文摘要

我们提出了Briee（带有交流探索利用的块结构化表示学习），这是一种具有块结构动力学（即块MDPS）的马尔可夫决策过程中有效增强学习算法，其中从一组未知的潜在状态产生了丰富的观察结果。布里（Briee）将潜在状态的发现，探索和剥削交织在一起，并可以证明在潜在的状态，行动和时间范围的数量中，以样本复杂性进行多项性缩放，而不依赖于潜在无限观察领域的大小，从而可以学习一个近乎最佳的策略。从经验上讲，我们表明Briee比最先进的MDP MDP算法本垒打和其他经验RL基线更有效，该基准在挑战需要深入探索的富裕观察结合锁定问题方面更有效。

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states, actions, and the time horizon, with no dependence on the size of the potentially infinite observation space. Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems that require deep exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题