论文标题
注意不能是解释
Attention cannot be an Explanation
论文作者
论文摘要
假定基于注意力的解释(即显着图)通过为黑匣子模型(例如深神经网络)提供可解释性,可以改善人类对基本模型的信任和依赖。最近,已经表明,注意力的权重经常与基于梯度的特征重要性相关。在此激励的推动下,我们提出了一个后续问题:“假设我们仅考虑注意力重量与特征重要性相关的任务,这些基于注意力的解释在增加人类对基础模型的信任和依赖方面的有效性如何?”换句话说,我们可以将注意力用作解释吗?我们进行了广泛的人类研究实验,旨在定性和定量评估基于注意力的解释适合增加人类信任和依赖的程度。我们的实验结果表明,注意力不能用作解释。
Attention based explanations (viz. saliency maps), by providing interpretability to black box models such as deep neural networks, are assumed to improve human trust and reliance in the underlying models. Recently, it has been shown that attention weights are frequently uncorrelated with gradient-based measures of feature importance. Motivated by this, we ask a follow-up question: "Assuming that we only consider the tasks where attention weights correlate well with feature importance, how effective are these attention based explanations in increasing human trust and reliance in the underlying models?". In other words, can we use attention as an explanation? We perform extensive human study experiments that aim to qualitatively and quantitatively assess the degree to which attention based explanations are suitable in increasing human trust and reliance. Our experiment results show that attention cannot be used as an explanation.