论文标题
产生和检测真正的歧义:DNN监督测试中被遗忘的危险
Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing
论文作者
论文摘要
深神经网络(DNNS)已成为现代软件系统的关键组成部分,但是它们在与训练期间观察到的条件不同的条件下很容易失败,或者在培训期间观察到的条件不同,或者在其标签上承认多个类别的输入,这些输入的输入是真正模棱两可的。最近的工作提出了DNN主管在可能的错误分类之前检测高度确定性输入会导致任何危害。为了测试和比较DNN主管的能力,研究人员提出了测试生成技术,将测试工作集中在高度确定性输入上,这些输入应被主管识别为异常。但是,现有的测试发电机旨在产生分布式输入。没有现有的模型和主管独立技术针对真正模棱两可的测试输入的产生,即根据专家人类判断,可以接受多个类别的输入。 在本文中,我们提出了一种新的方法来生成模棱两可的输入来测试DNN主管,并将其用来比较几种现有的主管技术。特别是,我们提出了歧义,以生成图像分类问题的模棱两可的样本。模棱两可的基于正规化对抗自动编码器的潜在空间中的梯度指导采样。此外,据我们所知,我们对DNN主管的最广泛的比较研究进行了考虑,考虑到他们检测4种不同类型的高确定性输入,包括真正模棱两可的输入。我们发现,经过测试的主管的能力是互补的:最适合检测真正的歧义的人在无效,分发和对抗性输入和反之亦然方面表现较差。
Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -- to the best of our knowledge -- the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors' capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.