强大的模型培训和概括与学生流有关

论文标题

强大的模型培训和概括与学生流有关

Robust model training and generalisation with Studentising flows

论文作者

Alexanderson, Simon, Henter, Gustav Eje

论文摘要

正常化的流是可进行的概率模型，它利用深度学习的力量来描述广泛的分布家族，同时使用最大可能性保持训练。我们讨论如何根据鲁棒（尤其是抗性）统计数据的见解进一步改进这些方法。具体而言，我们建议将基于流量的潜在分布（例如多元学生的$ t $）赋予基于流量的模型，这是对常规归一化流量使用的高斯分布的简单替换。尽管鲁棒性带来了许多优势，但本文探讨了其中的两个：1）我们描述了使用粘尾基本分布如何提供类似于梯度剪辑的好处，但不会损害该方法的渐近一致性。 2）我们还讨论了强大的想法如何导致概括差距减少并改善持有数据可能性的模型。在几个不同数据集上的实验证实了这两个方面所提出的方法的功效。

Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood. We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics. Specifically, we propose to endow flow-based models with fat-tailed latent distributions such as multivariate Student's $t$, as a simple drop-in replacement for the Gaussian distribution used by conventional normalising flows. While robustness brings many advantages, this paper explores two of them: 1) We describe how using fatter-tailed base distributions can give benefits similar to gradient clipping, but without compromising the asymptotic consistency of the method. 2) We also discuss how robust ideas lead to models with reduced generalisation gap and improved held-out data likelihood. Experiments on several different datasets confirm the efficacy of the proposed approach in both regards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题