一种用于判别性高斯子空间聚类的贝叶斯Fisher-EM算法

论文标题

一种用于判别性高斯子空间聚类的贝叶斯Fisher-EM算法

A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering

论文作者

Jouvin, Nicolas, Bouveyron, Charles, Latouche, Pierre

论文摘要

高维数据聚类已成为现代统计和机器学习的一项艰巨任务，并具有广泛的应用。我们在这项工作中考虑强大的歧视性潜在混合模型，并将其扩展到贝叶斯框架。将数据建模为在低维判别子空间中的高斯混合物，在潜在的组平均值中引入了高斯先验分布，并考虑到不同的协方差结构，得出了十二个子模型的家族。模型推断是通过各种EM算法完成的，而判别子空间是通过Fisher-Step最大化无监督的Fisher标准来估算的。提出了一个经验贝叶斯程序，以估算先前的超参数，并得出了一个集成的分类可能性标准，用于选择簇数和子模型的数量。在两个彻底的模拟场景中，研究了贝叶斯Fisher-EM算法的性能，既有维度和噪声，又评估其相对于最先进的高斯子空间聚类模型的优势。除了标准的实际数据基准外，还提出了对单个图像denoising的应用，以显示相关的结果。这项工作带有纸张随附的Fisherem包中R软件的参考实现。

High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional discriminative subspace, a Gaussian prior distribution is introduced over the latent group means and a family of twelve submodels are derived considering different covariance structures. Model inference is done with a variational EM algorithm, while the discriminative subspace is estimated via a Fisher-step maximizing an unsupervised Fisher criterion. An empirical Bayes procedure is proposed for the estimation of the prior hyper-parameters, and an integrated classification likelihood criterion is derived for selecting both the number of clusters and the submodel. The performances of the resulting Bayesian Fisher-EM algorithm are investigated in two thorough simulated scenarios, regarding both dimensionality as well as noise and assessing its superiority with respect to state-of-the-art Gaussian subspace clustering models. In addition to standard real data benchmarks, an application to single image denoising is proposed, displaying relevant results. This work comes with a reference implementation for the R software in the FisherEM package accompanying the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题