Multiviz：旨在可视化和理解多模型

论文标题

Multiviz：旨在可视化和理解多模型

MultiViz: Towards Visualizing and Understanding Multimodal Models

论文作者

Liang, Paul Pu, Lyu, Yiwei, Chhablani, Gunjan, Jain, Nihal, Deng, Zihao, Wang, Xingbo, Morency, Louis-Philippe, Salakhutdinov, Ruslan

论文摘要

多模型对现实世界应用的承诺激发了可视化和理解其内部力学的研究，其最终目标是使利益相关者能够可视化模型行为，执行模型调试并促进对机器学习模型的信任。但是，现代的多模型模型通常是黑盒神经网络，这使得了解其内部力学变得具有挑战性。在这些模型中，我们如何能看到多模式相互作用的内部建模？ Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in决策级特征，以及（4）多模式预测：如何组成决策级特征以做出预测。 Multiviz旨在在各种方式，模型，任务和研究领域运作。通过对6个现实世界任务的8个训练模型进行实验，我们表明，Multiviz中的互补阶段共同使用户能够（1）模拟模型预测，（2）将可解释的概念分配给特征，（3）对模型错误分类进行错误分析，以及（4）（4）从错误分析对辩论模型中使用洞察力。 Multiviz公开可用，定期使用新的解释工具和指标更新，并欢迎社区的意见。

The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize the internal modeling of multimodal interactions in these models? Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in decision-level features, and (4) multimodal prediction: how decision-level features are composed to make a prediction. MultiViz is designed to operate on diverse modalities, models, tasks, and research areas. Through experiments on 8 trained models across 6 real-world tasks, we show that the complementary stages in MultiViz together enable users to (1) simulate model predictions, (2) assign interpretable concepts to features, (3) perform error analysis on model misclassifications, and (4) use insights from error analysis to debug models. MultiViz is publicly available, will be regularly updated with new interpretation tools and metrics, and welcomes inputs from the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题