基于排序的快速且可解释的聚类

论文标题

基于排序的快速且可解释的聚类

Fast and explainable clustering based on sorting

论文作者

Chen, Xinye, Güttel, Stefan

论文摘要

我们引入了一种称为Classix的快速且可解释的聚类方法。它由两个阶段组成，即分类数据分为附近数据点的贪婪聚合阶段，然后将组合并为簇。该算法由两个标量参数控制，即聚集的距离参数，另一个控制最小群集大小的参数。进行了广泛的实验，以对合成和现实数据集的聚类性能进行全面评估，并具有各种群集形状，并且具有较低的特征维度。我们的实验表明，Classix与最先进的聚类算法竞争。该算法具有线性空间的复杂性，并且在各种问题上实现了几乎线性的时间复杂性。它的固有简单性允许生成计算群集的直观解释。

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题