论文标题
变压器可解释性超出注意力可视化
Transformer Interpretability Beyond Attention Visualization
论文作者
论文摘要
自我发挥的技术,特别是变形金刚在文本处理领域中占主导地位,并且在计算机视觉分类任务中变得越来越流行。为了可视化导致一定分类的图像部分,现有方法依赖于所获得的注意图或沿注意图采用启发式传播。在这项工作中,我们提出了一种新颖的方式来计算变压器网络的相关性。该方法基于深泰勒分解原理分配局部相关性,然后通过层传播这些相关得分。这种传播涉及注意层和跳过连接,这些连接挑战了现有方法。我们的解决方案基于特定的公式,该公式被证明可以维持跨层的总相关性。我们在最近的Visual Transformer网络以及文本分类问题上基准了我们的方法,并证明了与现有的可解释性方法相比具有明显的优势。
Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing methods either rely on the obtained attention maps or employ heuristic propagation along the attention graph. In this work, we propose a novel way to compute relevancy for Transformer networks. The method assigns local relevance based on the Deep Taylor Decomposition principle and then propagates these relevancy scores through the layers. This propagation involves attention layers and skip connections, which challenge existing methods. Our solution is based on a specific formulation that is shown to maintain the total relevancy across layers. We benchmark our method on very recent visual Transformer networks, as well as on a text classification problem, and demonstrate a clear advantage over the existing explainability methods.