论文标题
基于专家知识和强化学习的智能火车操作算法
Smart Train Operation Algorithms based on Expert Knowledge and Reinforcement Learning
论文作者
论文摘要
在最近的几十年中,自动火车操作(ATO)系统在许多地铁系统中逐渐采用了其低成本和智能。本文提出了两种智能火车操作算法,通过将专家知识与强化学习算法相结合。与以前的工作相比,所提出的算法可以实现地铁系统连续动作的控制,并在不使用离线速度配置文件的情况下优化了多个关键目标。首先,通过学习经验丰富的地铁驱动程序的历史数据,我们提取了专家知识规则并建立推理方法,以确保骑行舒适度,守时性和地铁系统的安全性。然后,我们开发了两种算法来优化火车运行的能源效率。一种是基于命名的深层确定性策略梯度(STOD)的智能火车操作(Sto)算法,另一个是基于归一化优势功能(Ston)的智能火车操作算法。最后,我们通过一些数值模拟验证了拟议算法的性能,并通过北京地铁的Yizhuang系列的实际字段数据进行了验证,并说明开发的智能火车操作算法比专家手动驱动器和现有的ATO ATO算法在能源效率方面更好。此外,STOD和Ston可以适应不同的行程时间和不同的电阻条件。
During recent decades, the automatic train operation (ATO) system has been gradually adopted in many subway systems for its low-cost and intelligence. This paper proposes two smart train operation algorithms by integrating the expert knowledge with reinforcement learning algorithms. Compared with previous works, the proposed algorithms can realize the control of continuous action for the subway system and optimize multiple critical objectives without using an offline speed profile. Firstly, through learning historical data of experienced subway drivers, we extract the expert knowledge rules and build inference methods to guarantee the riding comfort, the punctuality, and the safety of the subway system. Then we develop two algorithms for optimizing the energy efficiency of train operation. One is the smart train operation (STO) algorithm based on deep deterministic policy gradient named (STOD) and the other is the smart train operation algorithm based on normalized advantage function (STON). Finally, we verify the performance of proposed algorithms via some numerical simulations with the real field data from the Yizhuang Line of the Beijing Subway and illustrate that the developed smart train operation algorithm are better than expert manual driving and existing ATO algorithms in terms of energy efficiency. Moreover, STOD and STON can adapt to different trip times and different resistance conditions.