MM-shap：一种用于测量视觉和语言模型和任务中多模式贡献的性能指标

论文标题

MM-shap：一种用于测量视觉和语言模型和任务中多模式贡献的性能指标

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

论文作者

Parcalabescu, Letitia, Frank, Anette

论文摘要

已知视觉和语言模型（VL）可以利用单个模态（例如，通过分布偏见引入）中的不稳定指标，而不是专注于每种模式中的相关信息。单峰模型在VL任务上的准确性与多模态的准确性相似，这表明发生了所谓的单峰崩溃。但是，基于精度的测试无法检测到模型预测错误时，而模型使用模式中的相关信息。取而代之的是，我们提出了MM-Shap，这是一种基于Shapley值的性能多模式得分，该值可靠地量化了比例多模型使用单个模式的比例。我们以两种方式应用MM-shap：（1）比较其平均多模式程度的模型，（2）对单个模型进行测量单个模式对不同任务和数据集的贡献。在四个VL任务上，具有六个VL模型的实验 - LXMERT，夹子和四个ALBEF变体 - 强调了单峰崩溃可能在不同的程度和不同的方向上发生，这与单峰倒塌是单一侧面的广泛假设相矛盾。根据我们的结果，我们建议使用MM-SHAP来分析多模式任务，以诊断和指导多模式集成的进展。代码可在\ url {https://github.com/heidelberg-nlp/mm-shap}中获得。

Vision and language models (VL) are known to exploit unrobust indicators in individual modalities (e.g., introduced by distributional biases) instead of focusing on relevant information in each modality. That a unimodal model achieves similar accuracy on a VL task to a multimodal one, indicates that so-called unimodal collapse occurred. However, accuracy-based tests fail to detect e.g., when the model prediction is wrong, while the model used relevant information from a modality. Instead, we propose MM-SHAP, a performance-agnostic multimodality score based on Shapley values that reliably quantifies in which proportions a multimodal model uses individual modalities. We apply MM-SHAP in two ways: (1) to compare models for their average degree of multimodality, and (2) to measure for individual models the contribution of individual modalities for different tasks and datasets. Experiments with six VL models -- LXMERT, CLIP and four ALBEF variants -- on four VL tasks highlight that unimodal collapse can occur to different degrees and in different directions, contradicting the wide-spread assumption that unimodal collapse is one-sided. Based on our results, we recommend MM-SHAP for analysing multimodal tasks, to diagnose and guide progress towards multimodal integration. Code available at \url{https://github.com/Heidelberg-NLP/MM-SHAP}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题