交响乐：学习现实和多样化的代理人用于自主驾驶模拟

论文标题

交响乐：学习现实和多样化的代理人用于自主驾驶模拟

Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation

论文作者

Igl, Maximilian, Kim, Daewoo, Kuefler, Alex, Mougin, Paul, Shah, Punit, Shiarlis, Kyriacos, Anguelov, Dragomir, Palatucci, Mark, White, Brandyn, Whiteson, Shimon

论文摘要

模拟是加速自动驾驶汽车开发的关键工具。进行模拟现实需要与此类汽车互动的人类道路使用者的模型。可以通过将示范学习（LFD）的学习应用于已经在道路上已经观察到的轨迹来获得此类模型。但是，现有的LFD方法通常不足，产生经常碰撞或开车离开道路的政策。为了解决这个问题，我们提出了交响曲，该交响曲通过将常规策略与平行梁搜索相结合，从而极大地改善了现实主义。光束搜索通过修剪歧视者对不利评估的分支进行修剪的分支，从而飞出这些策略。但是，它也可能损害多样性，即代理商涵盖了现实行为的整个分布，因为修剪可以鼓励模式崩溃。交响乐通过分层方法解决这个问题，将代理行为分解为目标生成和目标调节。这种目标的使用确保了在对抗训练期间代理多样性不会消失，也不会因梁搜索而修剪。专有和开放Waymo数据集的实验证实，交响乐器比几个基准都学到更现实和多样化的行为。

Simulation is a crucial tool for accelerating the development of autonomous vehicles. Making simulation realistic requires models of the human road users who interact with such cars. Such models can be obtained by applying learning from demonstration (LfD) to trajectories observed by cars already on the road. However, existing LfD methods are typically insufficient, yielding policies that frequently collide or drive off the road. To address this problem, we propose Symphony, which greatly improves realism by combining conventional policies with a parallel beam search. The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator. However, it can also harm diversity, i.e., how well the agents cover the entire distribution of realistic behaviour, as pruning can encourage mode collapse. Symphony addresses this issue with a hierarchical approach, factoring agent behaviour into goal generation and goal conditioning. The use of such goals ensures that agent diversity neither disappears during adversarial training nor is pruned away by the beam search. Experiments on both proprietary and open Waymo datasets confirm that Symphony agents learn more realistic and diverse behaviour than several baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题