论文标题
在神经模型中,如何在跨层中出现决策?用可区分掩盖的解释
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking
论文作者
论文摘要
归因方法评估输入对模型预测的贡献。这样做的一种方法是擦除:如果可以将其去除而不影响预测,则将一部分输入视为无关。尽管在概念上很简单,但Erasure的目标是棘手的,而现代深入NLP模型的近似搜索仍然昂贵。擦除也容易受到事后偏见的影响:可以删除输入的事实并不意味着模型“知道”可以将其丢弃。由此产生的修剪过度侵入性,并不能反映模型如何到达预测。为了应对这些挑战,我们引入了可区分的掩蔽。 DiffMask学会了在保持不同性的同时掩盖输入子集。决定或无视输入令牌的决定是由基于分析模型的中间隐藏层的简单模型制成的。首先,这使该方法有效,因为我们预测而不是搜索。其次,与探测分类器一样,这揭示了相应层的网络“知道”。这不仅使我们绘制归因热图,还可以分析如何在网络层之间形成决策。我们使用diffmask来研究情感分类和问答的BERT模型。
Attribution methods assess the contribution of inputs to the model prediction. One way to do so is erasure: a subset of inputs is considered irrelevant if it can be removed without affecting the prediction. Though conceptually simple, erasure's objective is intractable and approximate search remains expensive with modern deep NLP models. Erasure is also susceptible to the hindsight bias: the fact that an input can be dropped does not mean that the model `knows' it can be dropped. The resulting pruning is over-aggressive and does not reflect how the model arrives at the prediction. To deal with these challenges, we introduce Differentiable Masking. DiffMask learns to mask-out subsets of the input while maintaining differentiability. The decision to include or disregard an input token is made with a simple model based on intermediate hidden layers of the analyzed model. First, this makes the approach efficient because we predict rather than search. Second, as with probing classifiers, this reveals what the network `knows' at the corresponding layers. This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers. We use DiffMask to study BERT models on sentiment classification and question answering.