一种基于可行模型的边际和多元聚类的方法

论文标题

一种基于可行模型的边际和多元聚类的方法

A parallelizable model-based approach for marginal and multivariate clustering

论文作者

de Carvalho, Miguel, Venturini, Gabriel Martos, Svetlošák, Andrej

论文摘要

本文开发了一种聚类方法，该方法利用基于模型的聚类的坚固性，同时试图减轻其某些陷阱。首先，我们注意到，基于标准模型的聚类可能会导致每个边距相同数量的簇数，这对于各种数据集来说似乎是一个相当人为的假设。我们通过指定每个边距的有限混合模型来解决此问题，该模型允许每个余量具有不同数量的群集，然后使用策略游戏启发的算法聚集多元数据，我们称之为Reign-and-conquer。其次，由于所提出的聚类方法仅指定边缘的模型 - 但仍将关节留下未指定的关节 - 它具有部分平行的优势。因此，所提出的方法在计算上具有吸引力，并且对于中等至高维的方法比“完整”（联合）基于模型的聚类方法更容易受到处理。关于人造数据的数字实验的一系列实验表明，在各种情况下，所提出的方法的总体表现良好，实际数据集用于在实践中展示其应用程序。

This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题