论文标题
旋律:生成和可视化机器学习模型摘要以了解数据和分类器
Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together
论文作者
论文摘要
随着机器学习模型的越来越复杂,开发模型解释技术的趋势越来越多,该技术仅着眼于一个实例(本地解释),以确保对原始模型的忠诚。尽管这些技术在各种数据原始数据(例如表格,图像或文本)上提供了准确的模型可解释性,但整体可解释的人工智能(XAI)体验也需要对模型和数据集进行全球解释,以启用不同粒度的感觉。因此,在协同模型解释和视觉分析方法方面具有巨大的潜力。在本文中,我们提出了一种旋律,这是一种交互式算法,可通过使用信息理论汇总局部解释来构建模型和数据行为的最佳全局概述。结果(即说明摘要)不需要其他学习模型,数据原始限制或向用户了解机器学习的知识。我们还设计了旋律UI,这是一种交互式视觉分析系统,以说明解释摘要如何将各种XAI任务中的点连接到从全局概述到本地检查。我们提供有关表格,图像和文本分类的三种用法方案,以说明如何概括不同数据的模型解释性。我们的实验表明我们的方法:(1)与直接的信息理论摘要相比,提供了更好的解释摘要,并且(2)在端到端数据建模管道中实现了显着的加速。
With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text), a holistic Explainable Artificial Intelligence (XAI) experience also requires a global explanation of the model and dataset to enable sensemaking in different granularity. Thus, there is a vast potential in synergizing the model explanation and visual analytics approaches. In this paper, we present MELODY, an interactive algorithm to construct an optimal global overview of the model and data behavior by summarizing the local explanations using information theory. The result (i.e., an explanation summary) does not require additional learning models, restrictions of data primitives, or the knowledge of machine learning from the users. We also design MELODY UI, an interactive visual analytics system to demonstrate how the explanation summary connects the dots in various XAI tasks from a global overview to local inspections. We present three usage scenarios regarding tabular, image, and text classifications to illustrate how to generalize model interpretability of different data. Our experiments show that our approaches: (1) provides a better explanation summary compared to a straightforward information-theoretic summarization and (2) achieves a significant speedup in the end-to-end data modeling pipeline.