通过L统计量最小化的无监督学习

论文标题

通过L统计量最小化的无监督学习

Robust Unsupervised Learning via L-Statistic Minimization

论文作者

Maurer, Andreas, Parletta, Daniela A., Paudice, Andrea, Pontil, Massimiliano

论文摘要

设计对基础数据分布扰动具有抵抗力的学习算法是一个广泛实用和理论上重要性的问题。我们为这个问题提供了一种一般方法，重点是无监督学习。关键假设是，扰动分布的特征是相对于给定类别的可允许模型较大的损失。这是由一般血统算法所利用的，该算法最大程度地减少了模型类别的$ L $统计标准，从而增加了小损失。我们的分析以相对于基础不受干扰的分布的重建误差的界限来表征该方法的鲁棒性。作为副产品，我们证明了相对于无监督学习中几个流行模型的拟议标准的均匀收敛界限，这一结果可能具有独立的兴趣。进行Kmeans群集和主要子空间分析的数量实验证明了我们方法的有效性。

Designing learning algorithms that are resistant to perturbations of the underlying data distribution is a problem of wide practical and theoretical importance. We present a general approach to this problem focusing on unsupervised learning. The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models. This is exploited by a general descent algorithm which minimizes an $L$-statistic criterion over the model class, weighting small losses more. Our analysis characterizes the robustness of the method in terms of bounds on the reconstruction error relative to the underlying unperturbed distribution. As a byproduct, we prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning, a result which may be of independent interest.Numerical experiments with kmeans clustering and principal subspace analysis demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题