将媒体分类任务得出的可解释的多尺度的信息贴片定位

论文标题

将媒体分类任务得出的可解释的多尺度的信息贴片定位

Localizing Interpretable Multi-scale informative Patches Derived from Media Classification Task

论文作者

Yang, Chuanguang, An, Zhulin, Hu, Xiaolong, Zhu, Hui, Xu, Yongjun

论文摘要

深度卷积神经网络（CNN）始终取决于更广泛的接受场（RF）和更复杂的非线性性来实现最先进的性能，而遭受了越来越难以解释相关斑块如何贡献最终预测的难度。在本文中，我们构建了一个具有精心设计的RF和线性空间汇总的可解释的锚固板，以提供输入媒体的补丁可解释性，同时仅在媒体级别的标签上监督的，而无需任何额外的界限盒注释。局部信息图像和文本贴片的可视化显示了锚固板的出色多尺度定位能力。我们进一步使用局部补丁来跨广泛应用网络进行下游分类任务。实验结果表明，用其分类贴片代替原始输入可以获得明显的推理加速度，而仅较小的性能降解，这证明局部贴片确实可以保留原始输入的最多语义和证据。

Deep convolutional neural networks (CNN) always depend on wider receptive field (RF) and more complex non-linearity to achieve state-of-the-art performance, while suffering the increased difficult to interpret how relevant patches contribute the final prediction. In this paper, we construct an interpretable AnchorNet equipped with our carefully designed RFs and linearly spatial aggregation to provide patch-wise interpretability of the input media meanwhile localizing multi-scale informative patches only supervised on media-level labels without any extra bounding box annotations. Visualization of localized informative image and text patches show the superior multi-scale localization capability of AnchorNet. We further use localized patches for downstream classification tasks across widely applied networks. Experimental results demonstrate that replacing the original inputs with their patches for classification can get a clear inference acceleration with only tiny performance degradation, which proves that localized patches can indeed retain the most semantics and evidences of the original inputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题