通过基于模型的行动提案摊销Q学习，以在高速公路上自动驾驶

论文标题

通过基于模型的行动提案摊销Q学习，以在高速公路上自动驾驶

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

论文作者

Mirchevska, Branka, Hügle, Maria, Kalweit, Gabriel, Werling, Moritz, Boedecker, Joschka

论文摘要

建立的基于优化的方法可以保证短暂优化范围的最佳轨迹，通常不超过几秒钟。结果，选择此短范围的最佳轨迹仍可能导致次优的长期解决方案。同时，由此产生的短期轨迹允许在动态的交通环境中进行有效，舒适和可证明的安全动作。在这项工作中，我们解决了如何确保最佳长期驾驶策略的问题，同时保持经典轨迹计划的好处。我们介绍了一种基于强化学习的方法，再加上轨迹计划者，学习了在高速公路上驾驶的最佳长期决策策略。通过在线生成本地最佳操作作为动作，我们在无限的低级连续动作空间之间进行平衡，以及固定数量的预定义的标准车道更换动作的灵活性有限。我们评估了开源流量模拟器Sumo中现实情况的方法，并且能够比您比较的4种基准方法获得更好的性能，包括随机操作选择代理，贪婪的代理，高级离散动作代理和基于IDM的SUMO控制代理。

Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题