论文标题

基于锤距离的分类数据基于模型的聚类

Model-based clustering of categorical data based on the Hamming distance

论文作者

Argiento, Raffaele, Filippi-Mazzola, Edoardo, Paci, Lucia

论文摘要

开发了一种基于模型的方法,用于群集分类数据,没有自然排序。提出的方法利用了锤距离来定义概率质量函数家族以建模数据。然后,该家族的元素被视为有限混合模型的内核,该模型具有未知数的组件。 共轭贝叶斯推断已得出了锤子分布模型的参数。该混合物在贝叶斯非参数设置中构架,并开发了跨二维的阻塞Gibbs采样器,以提供有关簇的数量,它们的结构和特定于组的参数,从而促进了相对于习惯可逆跳跃算法的计算。提出的模型将固定的潜在类模型涵盖了固定组件数量时的特殊情况。通过模拟研究和参考数据集评估模型性能,显示了聚类恢复对现有方法的改进。

A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with an unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting, and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure, and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. The proposed model encompasses a parsimonious latent class model as a special case when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源