可解释的机器学习中的分歧问题：从业者的观点

论文标题

可解释的机器学习中的分歧问题：从业者的观点

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

论文作者

Krishna, Satyapriya, Han, Tessa, Gu, Alex, Wu, Steven, Jabbari, Shahin, Lakkaraju, Himabindu

论文摘要

由于各种事后解释方法越来越多地被利用以解释高风险设置中的复杂模型，因此对这些方法是否彼此之间的解释是否以及何时彼此不同意，以及如何在实践中解决这些分歧的解释是至关重要的。但是，几乎没有研究能为这些关键问题提供答案。在这项工作中，我们在可解释的机器学习中正式化并研究了分歧问题。更具体地说，我们定义了解释之间的分歧，分析在实践中发生这种分歧的频率以及从业者如何解决这些分歧。我们首先与数据科学家进行访谈，以了解不同方法对同一模型预测产生的解释之间的分歧，并引入了一个新颖的定量框架来形式化这种理解。然后，我们利用该框架通过四个现实世界数据集，六个最先进的事后解释方法和六个不同的预测模型进行严格的经验分析，以衡量各种流行解释方法产生的解释之间的分歧程度。此外，我们与数据科学家进行了在线用户研究，以了解他们如何解决上述分歧。我们的结果表明，（1）最先进的解释方法在他们输出的解释方面通常不同意，并且（2）机器学习从业人员在解决此类分歧时通常会采用临时启发式方法。这些发现表明，从业人员做出结果决定时可能会依靠误导性解释。他们还强调了开发原则框架以通过各种解释技术有效评估和比较解释输出的重要性。

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of whether and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we formalize and study the disagreement problem in explainable machine learning. More specifically, we define the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题