使用策略优化学习车辆路线问题

论文标题

使用策略优化学习车辆路线问题

Learning Vehicle Routing Problems using Policy Optimisation

论文作者

Sultana, Nasrin, Chan, Jeffrey, Qin, A. K., Sarwar, Tabinda

论文摘要

深度强化学习（DRL）已用于学习有效的启发式方法，以通过策略网络解决复杂的组合优化问题，并证明了有希望的绩效。现有作品集中在解决（车辆）路由问题上，因为它们在非平凡和难度之间具有良好的平衡。最先进的方法使用加强学习学习政策，而学识渊博的政策则是伪求解器。在某些情况下，这些方法表现出良好的性能，但是鉴于较大的搜索空间典型的组合/路由问题，它们可能会过快地融合到不良的政策上。为了防止这种情况，在本文中，我们提出了一种方法熵正规化增强学习（ERRL），该学习通过提供更多随机策略来支持探索，从而倾向于改善优化。从经验上讲，较低的差异ERRL提供了快速，稳定的RL训练。我们还在测试时间引入了本地搜索操作员的组合，从而显着改善了解决方案和补充ERRL。我们定性地证明，对于车辆路由问题，具有更高熵的政策可以使优化景观平滑，从而更容易优化。定量评估表明，该模型的性能与最先进的变体相媲美。在我们的评估中，我们从实验上说明该模型在车辆路由问题（例如电容车辆路由问题（CVRP）），具有固定机队问题（MRPFF）和旅行推销员问题等车辆路线问题（CVRP）等变体上产生最先进的性能。

Deep reinforcement learning (DRL) has been used to learn effective heuristics for solving complex combinatorial optimisation problem via policy networks and have demonstrated promising performance. Existing works have focused on solving (vehicle) routing problems as they have a nice balance between non-triviality and difficulty. State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver. These approaches have demonstrated good performance in some cases, but given the large search space typical combinatorial/routing problem, they can converge too quickly to poor policy. To prevent this, in this paper, we propose an approach name entropy regularised reinforcement learning (ERRL) that supports exploration by providing more stochastic policies, which tends to improve optimisation. Empirically, the low variance ERRL offers RL training fast and stable. We also introduce a combination of local search operators during test time, which significantly improves solution and complement ERRL. We qualitatively demonstrate that for vehicle routing problems, a policy with higher entropy can make the optimisation landscape smooth which makes it easier to optimise. The quantitative evaluation shows that the performance of the model is comparable with the state-of-the-art variants. In our evaluation, we experimentally illustrate that the model produces state-of-the-art performance on variants of Vehicle Routing problems such as Capacitated Vehicle Routing Problem (CVRP), Multiple Routing with Fixed Fleet Problems (MRPFF) and Travelling Salesman problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题