基于强化学习的基于模仿政策的控制

论文标题

基于强化学习的基于模仿政策的控制

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

论文作者

Cao, Zhangjie, Bıyık, Erdem, Wang, Woodrow Z., Raventos, Allan, Gaidon, Adrien, Rosman, Guy, Sadigh, Dorsa

论文摘要

近年来，自动驾驶已取得了重大进展，但是自动驾驶汽车仍然无法应付可能发生可能事故的高风险情况。在这样的近乎事故的情况下，即使是车辆行动的微小变化也可能导致截然不同的后果。为了避免在近乎事故的情况下采取不安全的行动，我们需要充分探索环境。但是，强化学习（RL）和模仿学习（IL）是两种广泛使用的政策学习方法，无法建模快速的阶段过渡，并且无法扩展以完全覆盖所有州。为了解决近似事故的驾驶，我们提出了一种层次结构加强和模仿学习（H-Reil）方法，该方法由IL学到的低级政策组成，用于离散驾驶模式，以及RL在不同驾驶模式之间切换的RL学到的高级政策。我们的方法通过将IL和RL的优势整合到统一的学习框架中来利用IL和RL的优势。实验结果和用户研究表明，与其他方法相比，我们的方法可以实现更高的效率和安全性。对政策的分析表明，在近乎婚外行驶的情况下，我们的高级政策在不同的低级政策之间进行了适当的切换。

Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题