类比和特征归因，用于模型不可知论的相似性学习者

论文标题

类比和特征归因，用于模型不可知论的相似性学习者

Analogies and Feature Attributions for Model Agnostic Explanation of Similarity Learners

论文作者

Ramamurthy, Karthikeyan Natesan, Dhurandhar, Amit, Wei, Dennis, Tariq, Zaid Bin

论文摘要

黑匣子模型的事后解释已在分类和回归设置中进行了广泛的研究。但是，对两个输入之间输出相似性的模型的解释已受到相对较少的关注。在本文中，我们为适用于表格和文本数据的相似性学习者提供了模型不可知的本地解释。我们首先提出了一种提供特征归因的方法，以解释由黑匣子相似者确定的一对输入之间的相似性。然后，我们将类比作为机器学习中的一种新形式。在这里，目标是确定各种类似的示例对，这些示例具有与输入对相同的相似程度，并提供了对模型预测基础的（潜在）因素的见解。类比的选择可以选择利用特征归因，从而连接两种形式的解释，同时仍保持互补性。我们证明我们的类比目标函数是supporular的，从而使人们有效地寻找了优质的类比。我们应用了建议的方法来解释最先进的句子编码器所预测的句子之间的相似性，以及医疗保健利用申请中患者之间的相似性。通过定量评估，仔细的用户研究和解释示例来衡量功效。

Post-hoc explanations for black box models have been studied extensively in classification and regression settings. However, explanations for models that output similarity between two inputs have received comparatively lesser attention. In this paper, we provide model agnostic local explanations for similarity learners applicable to tabular and text data. We first propose a method that provides feature attributions to explain the similarity between a pair of inputs as determined by a black box similarity learner. We then propose analogies as a new form of explanation in machine learning. Here the goal is to identify diverse analogous pairs of examples that share the same level of similarity as the input pair and provide insight into (latent) factors underlying the model's prediction. The selection of analogies can optionally leverage feature attributions, thus connecting the two forms of explanation while still maintaining complementarity. We prove that our analogy objective function is submodular, making the search for good-quality analogies efficient. We apply the proposed approaches to explain similarities between sentences as predicted by a state-of-the-art sentence encoder, and between patients in a healthcare utilization application. Efficacy is measured through quantitative evaluations, a careful user study, and examples of explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题