通过神经切线内核来解开深层的预测差异

论文标题

通过神经切线内核来解开深层的预测差异

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

论文作者

Kobayashi, Seijin, Aceituno, Pau Vilimelis, von Oswald, Johannes

论文摘要

识别不熟悉的投入，也称为分布（OOD）检测，是任何决策过程的关键属性。一种简单且经验验证的技术基于深层集合，其中预测对不同神经网络的差异可作为输入不确定性的替代。然而，缺少导致深层不确定性估计的导致归纳偏见的理论理解。为了改善我们对其行为的描述，我们研究了在简化的线性训练方案中运行较大层宽度的深层合奏，其中使用梯度下降训练的功能可以由神经切线内核来描述。我们确定了两个噪声来源，每种噪声源在初始化时的预测差异中诱导了明显的电感偏差。我们在理论上和经验上进一步表明，这两个噪声源都会影响玩具模型中非线性深共集的预测差异和训练后的现实环境。最后，我们提出了消除部分这些噪声源的实用方法，从而导致了重大变化并改善了训练有素的深层合奏中的OOD检测。

Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题