论文标题
通过查找相关子空间来解开神经网络预测的解释
Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces
论文作者
论文摘要
可解释的AI旨在通过为其预测产生解释来克服复杂的ML模型(如神经网络)的黑盒性质。说明通常采用与模型决策相关的热图识别输入特征(例如像素)的形式。但是,这些解释纠缠了进入整体复杂决策策略的潜在多个因素。我们建议通过在神经网络的某些中间层中提取捕获与预测相关的多个和不同激活模式(例如视觉概念)的子空间来解开解释。为了自动提取这些子空间,我们提出了两个新的分析,将PCA或ICA中的原理扩展到解释。这些新颖的分析,我们称之为主体相关的组件分析(PRCA)和分散的相关子空间分析(DRSA),最大化相关性,例如方差或峰度。这使得分析的重点更加强烈,而不是ML模型实际使用的预测,忽略模型不变的激活或概念。我们的方法足以与常见的归因技术(例如沙普利价值,综合梯度或LRP)一起工作。我们提出的方法表明,实际上具有有用,并且与基准和三种用例所证明的最新技术相比。
Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.