论文标题
一种用于判别性高斯子空间聚类的贝叶斯Fisher-EM算法
A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering
论文作者
论文摘要
高维数据聚类已成为现代统计和机器学习的一项艰巨任务,并具有广泛的应用。我们在这项工作中考虑强大的歧视性潜在混合模型,并将其扩展到贝叶斯框架。将数据建模为在低维判别子空间中的高斯混合物,在潜在的组平均值中引入了高斯先验分布,并考虑到不同的协方差结构,得出了十二个子模型的家族。模型推断是通过各种EM算法完成的,而判别子空间是通过Fisher-Step最大化无监督的Fisher标准来估算的。提出了一个经验贝叶斯程序,以估算先前的超参数,并得出了一个集成的分类可能性标准,用于选择簇数和子模型的数量。在两个彻底的模拟场景中,研究了贝叶斯Fisher-EM算法的性能,既有维度和噪声,又评估其相对于最先进的高斯子空间聚类模型的优势。除了标准的实际数据基准外,还提出了对单个图像denoising的应用,以显示相关的结果。这项工作带有纸张随附的Fisherem包中R软件的参考实现。
High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional discriminative subspace, a Gaussian prior distribution is introduced over the latent group means and a family of twelve submodels are derived considering different covariance structures. Model inference is done with a variational EM algorithm, while the discriminative subspace is estimated via a Fisher-step maximizing an unsupervised Fisher criterion. An empirical Bayes procedure is proposed for the estimation of the prior hyper-parameters, and an integrated classification likelihood criterion is derived for selecting both the number of clusters and the submodel. The performances of the resulting Bayesian Fisher-EM algorithm are investigated in two thorough simulated scenarios, regarding both dimensionality as well as noise and assessing its superiority with respect to state-of-the-art Gaussian subspace clustering models. In addition to standard real data benchmarks, an application to single image denoising is proposed, displaying relevant results. This work comes with a reference implementation for the R software in the FisherEM package accompanying the paper.