灵活的基于注意力的多政策融合，以进行有效的深度增强学习

论文标题

灵活的基于注意力的多政策融合，以进行有效的深度增强学习

Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning

论文作者

Chiu, Zih-Yun, Tuan, Yi-Lin, Wang, William Yang, Yip, Michael C.

论文摘要

强化学习（RL）代理长期以来一直试图达到人类学习的效率。人类是伟大的观察者，他们可以通过从各种来源汇总外部知识来学习，包括来自他人尝试执行任务的政策的观察。 RL的先前研究已纳入外部知识政策，以帮助代理提高样本效率。但是，执行任意组合和替换这些策略仍然是不平凡的，这是概括性和可传递性的重要特征。在这项工作中，我们介绍了知识接地的RL（KGRL），这是RL范式融合多个知识政策，并旨在以人类的效率和灵活性为目标。我们为KGRL提出了一个新的演员体系结构，知识包括知识的注意力网络（KIAN），该架构允许由于基于嵌入的细心动作预测而自由知识重排。 Kian还解决了熵不平衡，这是在最大熵KGRL中引起的问题，它通过新的策略分布设计阻碍了代理有效探索环境的问题。实验结果表明，Kian优于结合外部知识政策并实现有效和灵活学习的替代方法。我们的实施可从https://github.com/pascalson/kgrl.git获得

Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning. Humans are great observers who can learn by aggregating external knowledge from various sources, including observations from others' policies of attempting a task. Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency. However, it remains non-trivial to perform arbitrary combinations and replacements of those policies, an essential feature for generalization and transferability. In this work, we present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility. We propose a new actor architecture for KGRL, Knowledge-Inclusive Attention Network (KIAN), which allows free knowledge rearrangement due to embedding-based attentive action prediction. KIAN also addresses entropy imbalance, a problem arising in maximum entropy KGRL that hinders an agent from efficiently exploring the environment, through a new design of policy distributions. The experimental results demonstrate that KIAN outperforms alternative methods incorporating external knowledge policies and achieves efficient and flexible learning. Our implementation is available at https://github.com/Pascalson/KGRL.git

下载PDF全文

下载文献需遵守相关版权规定

论文标题