论文标题
通过计算复杂性镜头的模型可解释性
Model Interpretability through the Lens of Computational Complexity
论文作者
论文摘要
尽管有几个主张,表明某些模型比其他模型更容易解释 - 例如,“线性模型比深度神经网络更容易解释” - 我们仍然缺乏可解释性的原则性概念,无法在不同类别模型之间进行正式比较。我们通过研究民俗性可解释性主张在计算复杂性理论方面是否有相关性来朝着这样的概念迈出一步。我们专注于当地的事后解释性查询,这些查询从直觉上试图回答为什么通过给定模型以某种方式对单个输入进行分类。简而言之,我们说$ \ MATHCAL {C} _1 $的类别比另一个类$ \ Mathcal {C} _2 $更容易解释,如果在$ \ Mathcal {C} _2中回答模型后查询的计算复杂性,则比$ \ Mathcal c} c} c}更高。我们证明,这个概念与当前关于模型的解释性的信念提供了很好的理论。特别是,我们表明,在我们的定义和假设的标准复杂性理论假设(例如p $ \ neq $ np)下,线性模型和基于树的模型都比神经网络更容易解释。但是,我们的复杂性分析并不能在线性模型和基于树的模型之间提供明显的差异,因为我们根据所考虑的特定事后解释获得了不同的结果。最后,通过基于参数化的复杂性应用更好的复杂性分析,我们能够证明一个理论上的结果表明浅神经网络比更深的神经网络更容易解释。
In spite of several claims stating that some models are more interpretable than others -- e.g., "linear models are more interpretable than deep neural networks" -- we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class $\mathcal{C}_1$ of models is more interpretable than another class $\mathcal{C}_2$, if the computational complexity of answering post-hoc queries for models in $\mathcal{C}_2$ is higher than for those in $\mathcal{C}_1$. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P$\neq$NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.