开放式的多元化解决方案发现，具有调节的行为模式以进行跨域适应

论文标题

开放式的多元化解决方案发现，具有调节的行为模式以进行跨域适应

Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns for Cross-Domain Adaptation

论文作者

Xu, Kang, Ma, Yan, Wei, Bingsheng, Li, Wei

论文摘要

尽管加强学习可以为复杂的任务取得令人印象深刻的结果，但学习的政策通常容易在下游任务中失败，甚至较小的模型不匹配或意外的扰动。最近的工作表明，具有不同行为特征的政策人群可以推广到具有各种差异的下游环境。但是，由于受过训练的政策的行为不受限制，因此在现实世界系统（例如现实世界系统）的实际情况下，这种政策可能在部署期间造成灾难性损害。此外，培训各种政策而不对行为进行调节的策略可能导致不足的策略，以推断出具有动态变化的广泛测试条件。在这项工作中，我们旨在根据行为模式的正规化培训各种政策。我们通过观察环境中有部分状态信息的环境中的反向动态来激励我们的范式，并提出了调节（dir）培训具有调节行为的多样性的多样性，以发现有利于概括的所需模式。对不同环境的各种变化的大量经验结果表明，我们的方法比其他多样性驱动的对应物取得了改进。

While Reinforcement Learning can achieve impressive results for complex tasks, the learned policies are generally prone to fail in downstream tasks with even minor model mismatch or unexpected perturbations. Recent works have demonstrated that a policy population with diverse behavior characteristics can generalize to downstream environments with various discrepancies. However, such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems due to the unrestricted behaviors of trained policies. Furthermore, training diverse policies without regulation of the behavior can result in inadequate feasible policies for extrapolating to a wide range of test conditions with dynamics shifts. In this work, we aim to train diverse policies under the regularization of the behavior patterns. We motivate our paradigm by observing the inverse dynamics in the environment with partial state information and propose Diversity in Regulation (DiR) training diverse policies with regulated behaviors to discover desired patterns that benefit the generalization. Considerable empirical results on various variations of different environments indicate that our method attains improvements over other diversity-driven counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题