有条件的自动加入器是可解释的分类器

论文标题

有条件的自动加入器是可解释的分类器

Conditional Autoregressors are Interpretable Classifiers

论文作者

Elazar, Nathan

论文摘要

我们探讨了类条件自回旋（CA）模型在MNIST-10上执行图像分类的使用。自回归模型通过结合每个单独功能的概率来为整个输入分配概率；因此，CA做出的分类决策可以很容易地分解为每个输入功能的贡献。也就是说，CA本质上是可解释的。我们的实验表明，与标准分类器相比，天真的训练A的准确性更高，但这是由于过度拟合而不是缺乏表达能力所致。使用来自标准分类器的知识蒸馏，可以培训学生CA，以符合老师的表现，同时仍然可以解释。

We explore the use of class-conditional autoregressive (CA) models to perform image classification on MNIST-10. Autoregressive models assign probability to an entire input by combining probabilities from each individual feature; hence classification decisions made by a CA can be readily decomposed into contributions from each each input feature. That is to say, CA are inherently locally interpretable. Our experiments show that naively training a CA achieves much worse accuracy compared to a standard classifier, however this is due to over-fitting and not a lack of expressive power. Using knowledge distillation from a standard classifier, a student CA can be trained to match the performance of the teacher while still being interpretable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题