论文标题
一种基于可行模型的边际和多元聚类的方法
A parallelizable model-based approach for marginal and multivariate clustering
论文作者
论文摘要
本文开发了一种聚类方法,该方法利用基于模型的聚类的坚固性,同时试图减轻其某些陷阱。首先,我们注意到,基于标准模型的聚类可能会导致每个边距相同数量的簇数,这对于各种数据集来说似乎是一个相当人为的假设。我们通过指定每个边距的有限混合模型来解决此问题,该模型允许每个余量具有不同数量的群集,然后使用策略游戏启发的算法聚集多元数据,我们称之为Reign-and-conquer。其次,由于所提出的聚类方法仅指定边缘的模型 - 但仍将关节留下未指定的关节 - 它具有部分平行的优势。因此,所提出的方法在计算上具有吸引力,并且对于中等至高维的方法比“完整”(联合)基于模型的聚类方法更容易受到处理。关于人造数据的数字实验的一系列实验表明,在各种情况下,所提出的方法的总体表现良好,实际数据集用于在实践中展示其应用程序。
This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.