论文标题
使用多个目标搜索(体验纸)的关键点检测DNN的自动测试套件生成DNN
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
论文作者
论文摘要
在图像中,自动检测关键点(例如面部密钥点或手指钥匙点)的位置是许多应用程序中的必要问题,例如自动化驾驶系统中驾驶员的凝视检测和嗜睡检测。随着深度神经网络(DNNS)的最新进展,关键点检测DNNS(KP-DNNS)已越来越多地用于此目的。但是,KP-DNN测试和验证仍然是一个具有挑战性的问题,因为KP-DNNS同时预测了许多独立的关键点 - 在目标应用中,每个单独的关键点可能至关重要 - 并且图像可能会根据许多因素而变化很大。 在本文中,我们提出了一种使用多个目标搜索来自动生成KP-DNN的测试数据的方法。在我们的实验中,专注于为工业汽车应用开发的面部关键点检测DNN,我们表明我们的方法可以生成测试套件,以平均而言,平均而言,占所有关键点的93%以上。相比之下,基于随机搜索的测试数据生成只能严重错误地预测其中的41%。但是,这些错误预测中的许多是不可避免的,因此不应被视为失败。我们还从经验上比较了针对测试套件生成的最先进的,多目标搜索算法及其变体。此外,我们根据图像特征(例如头部姿势和肤色)研究并演示了如何学习特定条件,从而导致严重的错误预测。此类条件是风险分析或DNN再培训的基础。
Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time -- where each individual key-point may be critical in the targeted application -- and images can vary a great deal according to many factors. In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.