论文标题
通过基于模型的行动提案摊销Q学习,以在高速公路上自动驾驶
Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways
论文作者
论文摘要
建立的基于优化的方法可以保证短暂优化范围的最佳轨迹,通常不超过几秒钟。结果,选择此短范围的最佳轨迹仍可能导致次优的长期解决方案。同时,由此产生的短期轨迹允许在动态的交通环境中进行有效,舒适和可证明的安全动作。在这项工作中,我们解决了如何确保最佳长期驾驶策略的问题,同时保持经典轨迹计划的好处。我们介绍了一种基于强化学习的方法,再加上轨迹计划者,学习了在高速公路上驾驶的最佳长期决策策略。通过在线生成本地最佳操作作为动作,我们在无限的低级连续动作空间之间进行平衡,以及固定数量的预定义的标准车道更换动作的灵活性有限。我们评估了开源流量模拟器Sumo中现实情况的方法,并且能够比您比较的4种基准方法获得更好的性能,包括随机操作选择代理,贪婪的代理,高级离散动作代理和基于IDM的SUMO控制代理。
Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.