人类对模型行为与实际模型行为的解释在多大程度上？

论文标题

人类对模型行为与实际模型行为的解释在多大程度上？

To what extent do human explanations of model behavior align with actual model behavior?

论文作者

Prasad, Grusha, Nie, Yixin, Bansal, Mohit, Jia, Robin, Kiela, Douwe, Williams, Adina

论文摘要

鉴于NLP模型（Will）在我们的生活中越来越重要的作用，对于人类对模型行为的期望与实际模型行为保持一致非常重要。使用自然语言推断（NLI）作为案例研究，我们研究了人类生成的模型推理决策的解释程度与模型如何实际做出这些决定相符。更具体地说，我们定义了三个对齐指标，这些指标量化了自然语言的解释与模型对输入单词的敏感性的一致性，如通过集成梯度衡量。然后，我们评估了八种不同的模型（基本和大版本的Bert，Roberta和Electra，以及Anrnn和单词袋模型），并发现Bert-Base模型在所有对齐指标中都具有最高的与人类生成的解释的一致性。专注于变形金刚，我们发现基本版本往往与人类生成的解释相比，比其更大的对应物具有更高的一致性，这表明增加模型参数的数量在某些情况下导致了与人类解释更差的一致性。最后，我们发现模型与人类解释的一致性并不能由模型的准确性预测，这表明准确性和对齐方式是评估模型的补充方法。

Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior. Using Natural Language Inference (NLI) as a case study, we investigate the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions. More specifically, we define three alignment metrics that quantify how well natural language explanations align with model sensitivity to input words, as measured by integrated gradients. Then, we evaluate eight different models (the base and large versions of BERT, RoBERTa and ELECTRA, as well as anRNN and bag-of-words model), and find that the BERT-base model has the highest alignment with human-generated explanations, for all alignment metrics. Focusing in on transformers, we find that the base versions tend to have higher alignment with human-generated explanations than their larger counterparts, suggesting that increasing the number of model parameters leads, in some cases, to worse alignment with human explanations. Finally, we find that a model's alignment with human explanations is not predicted by the model's accuracy, suggesting that accuracy and alignment are complementary ways to evaluate models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题