论文标题
神经表示揭示了剩余卷积网络中的班级拟合模式
Neural Representations Reveal Distinct Modes of Class Fitting in Residual Convolutional Networks
论文作者
论文摘要
我们利用神经表示的概率模型来研究残留网络的适合类别。为此,我们估计了由深度重新集团学到的表示形式的类条件密度模型。然后,我们使用这些模型来表征跨学习类的表示形式的分布。令人惊讶的是,我们发现研究模型中的类并不以统一的方式安装。相反:我们发现两组具有明显不同的表示形式分布的班级。这些独特的类拟合模式只有在研究模型的更深层中才能明显,这表明它们与低级图像特征无关。我们表明,神经表示中未覆盖的结构与训练示例和对抗性鲁棒性的记忆有关。最后,我们比较了记忆和典型示例之间神经表示的阶级条件分布。这使我们能够发现网络结构类标签的位置,用于记忆和标准输入。
We leverage probabilistic models of neural representations to investigate how residual networks fit classes. To this end, we estimate class-conditional density models for representations learned by deep ResNets. We then use these models to characterize distributions of representations across learned classes. Surprisingly, we find that classes in the investigated models are not fitted in an uniform way. On the contrary: we uncover two groups of classes that are fitted with markedly different distributions of representations. These distinct modes of class-fitting are evident only in the deeper layers of the investigated models, indicating that they are not related to low-level image features. We show that the uncovered structure in neural representations correlate with memorization of training examples and adversarial robustness. Finally, we compare class-conditional distributions of neural representations between memorized and typical examples. This allows us to uncover where in the network structure class labels arise for memorized and standard inputs.