论文标题
我的驾驶员观察模型过于自信吗?用于可靠且可解释的置信度估计的输入引导的校准网络
Is my Driver Observation Model Overconfident? Input-guided Calibration Networks for Reliable and Interpretable Confidence Estimates
论文作者
论文摘要
驾驶员观察模型很少在完美的条件下部署。实际上,照明,相机放置和类型与训练过程中存在的情况不同,并且可能随时发生不可预见的行为。在观察方向盘后面的人类后,导致更直观的人与车辆互动和更安全的驾驶,它需要识别算法,这些算法不仅可以预测正确的驾驶员状态,而且还通过通过现实且可解释的置信度度量来确定其预测质量。可靠的不确定性估计对于建立信任至关重要,并且是在实际驾驶系统中部署活动识别网络的严重障碍。在这项工作中,我们首次研究了现代驾驶员观察模型的置信值确实与正确结果的概率相匹配,并表明基于原始的神经网络的方法往往会显着高估其预测质量。为了纠正置信价值与实际不确定性之间的错误对准,我们考虑了两种策略。首先,我们增强了经常用于使用温度缩放的驾驶员观察的两个活动识别模型 - 在图像分类中进行置信度校准。然后,我们通过输入指导(Caring)引入了校准的动作识别 - 一种新颖的方法,利用了一个额外的神经网络来学习根据视频表示的来扩展信心。关于驱动器和ACT数据集的广泛实验表明,这两种策略都大大提高了模型信心的质量,而我们的关怀模型均超过了原始体系结构及其温度缩放的增强,从而实现了最佳的不确定性估计。
Driver observation models are rarely deployed under perfect conditions. In practice, illumination, camera placement and type differ from the ones present during training and unforeseen behaviours may occur at any time. While observing the human behind the steering wheel leads to more intuitive human-vehicle-interaction and safer driving, it requires recognition algorithms which do not only predict the correct driver state, but also determine their prediction quality through realistic and interpretable confidence measures. Reliable uncertainty estimates are crucial for building trust and are a serious obstacle for deploying activity recognition networks in real driving systems. In this work, we for the first time examine how well the confidence values of modern driver observation models indeed match the probability of the correct outcome and show that raw neural network-based approaches tend to significantly overestimate their prediction quality. To correct this misalignment between the confidence values and the actual uncertainty, we consider two strategies. First, we enhance two activity recognition models often used for driver observation with temperature scaling-an off-the-shelf method for confidence calibration in image classification. Then, we introduce Calibrated Action Recognition with Input Guidance (CARING)-a novel approach leveraging an additional neural network to learn scaling the confidences depending on the video representation. Extensive experiments on the Drive&Act dataset demonstrate that both strategies drastically improve the quality of model confidences, while our CARING model out-performs both, the original architectures and their temperature scaling enhancement, leading to best uncertainty estimates.