论文标题
您所需要的只是贝叶斯深度学习的好功能性
All You Need is a Good Functional Prior for Bayesian Deep Learning
论文作者
论文摘要
贝叶斯对神经网络的处理表明,先前的分布是在其体重和偏见参数上指定的。这构成了挑战,因为现代神经网络的特征是大量参数,并且这些先验的选择对诱导的功能先验具有不受控制的影响,这是通过从其先前分布中抽样参数获得的功能的分布。我们认为,这是贝叶斯深度学习的一个极大限制的方面,这项工作以实用有效的方式解决了这一限制。我们的建议是从功能性先验的角度来理性的,这更容易引起,并以反映这种功能性先验的方式“调整”神经网络参数的先验。高斯工艺提供了一个严格的框架来定义功能上的先前分布,我们提出了一个新颖而健壮的框架,以最小化其Wasserstein距离的最小化,将其与神经网络的功能相匹配。我们提供了巨大的实验证据,表明这些先验与可伸缩的马尔可夫链蒙特卡洛采样相结合,提供了对先验的替代选择和最先进的近似贝叶斯深度学习方法的系统性改进。我们认为这项工作是朝着对神经网络进行全面贝叶斯治疗(包括卷积神经网络)的长期挑战的方向的重要一步。
The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility.