论文标题

学习在增强学习中的概括动力学的动力学

Learning Parsimonious Dynamics for Generalization in Reinforcement Learning

论文作者

Saanum, Tankred, Schulz, Eric

论文摘要

人类是熟练的导航员:我们恰当地在新的地方进行了操纵,何时回到以前见过的位置,甚至可以想到经历我们从未参观过的部分环境的捷径。另一方面,基于模型的强化学习中的当前方法与从训练分布中概括环境动态的概括。我们认为,两个原则可以帮助弥合这一差距:潜在的学习和简约的动态。人类倾向于用简单的术语来思考环境动态 - 我们认为轨迹不是指我们期望沿着路径所看到的,而是在抽象的潜在空间中,其中包含有关该位置的空间坐标的信息。此外,我们假设在环境的新颖部分中四处走动的工作方式与我们熟悉的部分相同。这两个原则在同时使用:在潜在空间中,动态表现出了简约的特征。我们开发了一种学习这种简约动态的模型。使用一个变分目标,我们的模型经过培训,可以使用本地线性转换在潜在空间中重建经验丰富的过渡,同时鼓励尽可能少地调用不同的变换。使用我们的框架,我们演示了在一系列政策学习和计划任务中学习放松的潜在动态模型的实用性。

Humans are skillful navigators: We aptly maneuver through new places, realize when we are back at a location we have seen before, and can even conceive of shortcuts that go through parts of our environments we have never visited. Current methods in model-based reinforcement learning on the other hand struggle with generalizing about environment dynamics out of the training distribution. We argue that two principles can help bridge this gap: latent learning and parsimonious dynamics. Humans tend to think about environment dynamics in simple terms -- we reason about trajectories not in reference to what we expect to see along a path, but rather in an abstract latent space, containing information about the places' spatial coordinates. Moreover, we assume that moving around in novel parts of our environment works the same way as in parts we are familiar with. These two principles work together in tandem: it is in the latent space that the dynamics show parsimonious characteristics. We develop a model that learns such parsimonious dynamics. Using a variational objective, our model is trained to reconstruct experienced transitions in a latent space using locally linear transformations, while encouraged to invoke as few distinct transformations as possible. Using our framework, we demonstrate the utility of learning parsimonious latent dynamics models in a range of policy learning and planning tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源