论文标题

实例加权的增量进化策略,用于在动态环境中进行加固学习

Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments

论文作者

Wang, Zhi, Chen, Chunlin, Dong, Daoyi

论文摘要

进化策略(ES)作为黑盒优化算法的家族,最近出现作为增强学习(RL)方法(例如Q学习或策略梯度)的可扩展替代方法,并且由于许多中央处理单元(CPU)(CPU)的速度要快得多。在本文中,我们建议在动态环境中为ES进行系统的增量学习方法。目标是在环境变化时将先前学到的策略调整为新的政策。我们将实例加权机制与ES结合在一起,以促进其学习适应,同时保持ES的可伸缩性。在参数更新期间,将更高的权重分配给包含更多新知识的实例,从而鼓励搜索分布转向参数空间的新有希望的领域。我们提出了两个易于实现的指标来计算权重:实例新颖性和实例质量。实例新颖性衡量了实例与原始环境中先前最佳的差异,而实例质量对应于实例在新环境中的性能。验证了实例加权增量演化策略(IW-ies)的实例算法,以在挑战性RL任务(从机器人导航到运动)等方面实现可显着提高的性能。因此,本文为RL域引入了一系列可扩展的ES算法,该算法可以快速学习适应动态环境。

Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源