论文标题

使用胶囊注意网络学习可扩展的策略,用于多机器人任务分配

Learning Scalable Policies over Graphs for Multi-Robot Task Allocation using Capsule Attention Networks

论文作者

Paul, Steve, Ghassemi, Payam, Chowdhury, Souma

论文摘要

本文提出了一种新颖的图形增强学习(RL)体系结构,以解决涉及截止日期和工作量任务的多机器人任务分配(MRTA)问题,以及机器人约束工作(例如工作能力)。在从最近的图形学习方法中汲取动机,这些方法学会解决组合优化(CO)问题,例如使用RL进行多流浪推销员和车辆路由问题,但本文试图提供更好的性能(与非学习方法相比)和重要的MIRTA类别类别类别的重要性能(与现有学习架构相比)。所提出的称为基于胶囊注意的机制或CAPAM的神经体系结构充当策略网络,其中包括三个主要组成部分:1)编码器:基于胶囊网络的基于胶囊网络的节点嵌入模型,以表示每个任务为可学习的特征向量; 2)解码器:基于注意力的模型,以促进顺序输出; 3)上下文:编码任务和机器人的状态。为了训练CAPAM模型,使用了基于加强的策略级别方法。当对看不见的方案进行评估时,CAPAM与基于标准的非学习在线MRTA方法相比,CAPAM表现出更好的任务完成绩效和更快的决策速度。与学习解决CO问题的流行方法相比,CAPAM在概括性方面的优势和测试大小比训练大小的问题的可伸缩性也成功证明了。

This paper presents a novel graph reinforcement learning (RL) architecture to solve multi-robot task allocation (MRTA) problems that involve tasks with deadlines and workload, and robot constraints such as work capacity. While drawing motivation from recent graph learning methods that learn to solve combinatorial optimization (CO) problems such as multi-Traveling Salesman and Vehicle Routing Problems using RL, this paper seeks to provide better performance (compared to non-learning methods) and important scalability (compared to existing learning architectures) for the stated class of MRTA problems. The proposed neural architecture, called Capsule Attention-based Mechanism or CapAM acts as the policy network, and includes three main components: 1) an encoder: a Capsule Network based node embedding model to represent each task as a learnable feature vector; 2) a decoder: an attention-based model to facilitate a sequential output; and 3) context: that encodes the states of the mission and the robots. To train the CapAM model, the policy-gradient method based on REINFORCE is used. When evaluated over unseen scenarios, CapAM demonstrates better task completion performance and $>$10 times faster decision-making compared to standard non-learning based online MRTA methods. CapAM's advantage in generalizability, and scalability to test problems of size larger than those used in training, are also successfully demonstrated in comparison to a popular approach for learning to solve CO problems, namely the purely attention mechanism.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源