在动态POMDP问题中快速适应的天生知识不断发展

论文标题

在动态POMDP问题中快速适应的天生知识不断发展

Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

论文作者

Ben-Iwhiwhu, Eseoghene, Ladosz, Pawel, Dick, Jeffery, Chen, Wen-Hua, Pilly, Praveen, Soltoggio, Andrea

论文摘要

快速在线适应不断变化的任务是机器学习中的一个重要问题，最近是元提升学习的重点。但是，强化学习（RL）算法在POMDP环境中挣扎，因为系统状态在RL框架中必不可少，并不总是可见。此外，手工设计的元RL架构可能不包括适合特定学习问题的合适计算结构。相反，在线学习机制的演变具有将学习策略纳入可以（i）在需要时发展记忆的代理商，并且（ii）优化针对特定在线学习问题的适应性速度。在本文中，我们利用神经编码的神经网络的高度适应性性质进化了使用POMDP中自动编码器潜在空间的控制器。对进化网络的分析揭示了所提出的算法在各个方面获取天生知识的能力，例如检测揭示隐式奖励的提示，以及进化有助于导航的位置神经元的能力。与某些非进化的元元素学习算法相比，先天知识和在线可塑性的整合能够快速适应和更好的性能。该算法也被证明是在3D游戏环境Malmo Minecraft中取得成功的。

Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.

下载PDF全文

下载文献需遵守相关版权规定

论文标题