论文标题
AI模型利用率测量用于查找类编码模式的测量
AI Model Utilization Measurements For Finding Class Encoding Patterns
论文作者
论文摘要
这项工作解决了(a)设计训练有素人工智能(AI)模型的利用率测量以及(b)基于这些测量结果中如何在AI模型中编码培训数据的问题。这些问题是由于AI模型在安全性和安全关键应用中缺乏解释性而引起的,例如使用AI模型将自动驾驶汽车中的交通标志分类。我们通过引入AI模型利用测量的理论基础来解决这些问题,并理解基于利用率的类别的计算图级别(AI模型),子图和图节点的流量符号的类别。从概念上讲,根据所有可能的输出(张量)在AI模型的每个图节点(计算单元)上定义了利用率。在这项工作中,从AI模型中提取了利用测量,其中包括中毒和清洁AI模型。与清洁AI模型相反,中毒的AI模型接受了包含系统的,可实现的,可实现的交通符号修改(即触发器)的交通符号图像,以在存在此类触发器的情况下将正确的类标签更改为另一个标签。我们分析了这种干净和中毒的AI模型的类编码,并以对特洛伊木马注入和检测的影响得出结论。
This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection.