论文标题
$ \ text {t}^3 $ omvp:一种基于变压器的时间和团队加强学习计划,用于观察受限的多车辆在市区
$ \text{T}^3 $OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area
论文作者
论文摘要
智能的车辆互联网(IOV)与人工智能(AI)结合使用,将有助于智能运输系统(ITS)中的车辆决策。多车辆追求游戏(MVP)是一种捕获移动目标的多车辆合作能力,逐渐成为一个热门的研究主题。尽管MVP领域在开放空间环境中取得了一些成就,但市区带来了复杂的道路结构,并限制了移动空间,这是MVP游戏解决的挑战。我们在本文中定义了一个观察到的MVP(OMVP)问题,并提出了基于变压器的时间和团队加强学习方案($ \ text {t}^3 $ OMVP)来解决该问题。首先,基于分散的部分观察到的马尔可夫决策过程(DEC-POMDP)来实例化此问题,建立了一个新的多车辆追击模型。其次,通过引入和修改基于变压器的观察顺序,QMIX被重新定义以适应复杂的道路结构,受限的移动空间和受约束的观测值,以控制车辆以追求与车辆观察结果结合的目标。第三,建立了多种交流的城市环境,以验证拟议的计划。广泛的实验结果表明,提出的$ \ text {t}^3 $ OMVP方案相对于最新的QMIX方法取得了重大改进,降低了9.66%〜106.25%。代码可在https://github.com/pipihaiziguai/t3omvp上找到。
Smart Internet of Vehicles (IoVs) combined with Artificial Intelligence (AI) will contribute to vehicle decision-making in the Intelligent Transportation System (ITS). Multi-Vehicle Pursuit games (MVP), a multi-vehicle cooperative ability to capture mobile targets, is becoming a hot research topic gradually. Although there are some achievements in the field of MVP in the open space environment, the urban area brings complicated road structures and restricted moving spaces as challenges to the resolution of MVP games. We define an Observation-constrained MVP (OMVP) problem in this paper and propose a Transformer-based Time and Team Reinforcement Learning scheme ($ \text{T}^3 $OMVP) to address the problem. First, a new multi-vehicle pursuit model is constructed based on decentralized partially observed Markov decision processes (Dec-POMDP) to instantiate this problem. Second, by introducing and modifying the transformer-based observation sequence, QMIX is redefined to adapt to the complicated road structure, restricted moving spaces and constrained observations, so as to control vehicles to pursue the target combining the vehicle's observations. Third, a multi-intersection urban environment is built to verify the proposed scheme. Extensive experimental results demonstrate that the proposed $ \text{T}^3 $OMVP scheme achieves significant improvements relative to state-of-the-art QMIX approaches by 9.66%~106.25%. Code is available at https://github.com/pipihaiziguai/T3OMVP.