使用上下文特征融合的行人行动预期在堆叠的RNN中

论文标题

使用上下文特征融合的行人行动预期在堆叠的RNN中

Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs

论文作者

Rasouli, Amir, Kotseruba, Iuliia, Tsotsos, John K.

论文摘要

城市环境中自动驾驶汽车的主要挑战之一是在穿越时了解和预测其他道路使用者的行动，尤其是行人。解决此问题的常见方法是利用代理的运动历史来预测其未来的轨迹。但是，行人表现出高度可变的动作，而没有对行人本身及其周围环境的视觉观察，其中大多数行动就无法理解。为此，我们提出了一个解决方案，以解决穿越点的行人行动预期问题。我们的方法使用了一种新颖的堆叠RNN体系结构，其中从各种来源收集的信息（场景动力学和视觉特征）逐渐以不同级别的处理级别融合到网络中。我们通过广泛的经验评估表明，与替代性复发网络体系结构相比，所提出的算法具有更高的预测准确性。我们进行实验，以研究观察长度，事件时间和特征类型对拟议方法的性能的影响。最后，我们演示了不同的数据融合策略如何影响预测准确性。

One of the major challenges for autonomous vehicles in urban environments is to understand and predict other road users' actions, in particular, pedestrians at the point of crossing. The common approach to solving this problem is to use the motion history of the agents to predict their future trajectories. However, pedestrians exhibit highly variable actions most of which cannot be understood without visual observation of the pedestrians themselves and their surroundings. To this end, we propose a solution for the problem of pedestrian action anticipation at the point of crossing. Our approach uses a novel stacked RNN architecture in which information collected from various sources, both scene dynamics and visual features, is gradually fused into the network at different levels of processing. We show, via extensive empirical evaluations, that the proposed algorithm achieves a higher prediction accuracy compared to alternative recurrent network architectures. We conduct experiments to investigate the impact of the length of observation, time to event and types of features on the performance of the proposed method. Finally, we demonstrate how different data fusion strategies impact prediction accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题