论文标题
解释性的心理理论
A Psychological Theory of Explainability
论文作者
论文摘要
可解释的人工智能(XAI)的目的是产生人类解释的解释,但是没有关于人类如何解释AI产生的解释的计算精确理论。缺乏理论意味着XAI的验证必须逐案基于经验进行,这可以防止XAI中的系统理论构建。我们提出了一种心理理论,即人类如何从显着性图中得出结论,这是XAI解释的最常见形式,这首先允许精确预测以解释为条件的解释性推论。我们的理论认为,没有解释的人类期望AI对自己做出类似的决定,并且通过与自己会提供的解释进行比较来解释解释。比较是通过Shepard在相似空间中的普遍泛化定律(一种认知科学的经典理论)形式化的。对AI图像分类的预注册用户研究具有显着性图的解释表明,我们的理论对参与者对AI的预测进行了定量匹配。
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.