在无监督的深度学习中，雅各布术语的相对梯度优化

论文标题

在无监督的深度学习中，雅各布术语的相对梯度优化

Relative gradient optimization of the Jacobian term in unsupervised deep learning

论文作者

Gresele, Luigi, Fissore, Giancarlo, Javaloy, Adrián, Schölkopf, Bernhard, Hyvärinen, Aapo

论文摘要

正确描述数据的学习表达概率模型在机器学习中是一个无处不在的问题。一种流行的解决方法是将观测值映射到具有简单的关节分布的表示空间中，通常可以将其写入其边缘的产物，从而与非线性独立组件分析领域建立了联系。深度密度模型已被广泛用于此任务，但是它们的最大可能性训练需要估计雅各布式的对数确定性，并且计算上昂贵，从而在计算和表达能力之间施加了权衡。在这项工作中，我们提出了一种新的方法来精确培训此类神经网络。基于相对梯度，我们利用神经网络参数的矩阵结构，即使在高维空间中也有效地计算更新。训练的计算成本在输入大小上是二次的，与幼稚方法的立方缩放相反。这允许使用涉及雅各布的对数确定的目标功能进行快速训练，而不会对其结构施加限制，与自回归标准化流量的鲜明鲜明对比。

Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian, without imposing constraints on its structure, in stark contrast to autoregressive normalizing flows.

下载PDF全文

下载文献需遵守相关版权规定

论文标题