基于模型的混合数值和二进制数据的共聚类

论文标题

基于模型的混合数值和二进制数据的共聚类

Model Based Co-clustering of Mixed Numerical and Binary Data

论文作者

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

论文摘要

共聚类是一种数据挖掘技术，用于在数据矩阵的行和列之间提取基础结构。已经研究了许多方法，并显示了在连续，二元或应急表中提取此类结构的能力。但是，几乎没有完成在混合类型数据上进行共聚类的工作。在本文中，我们将基于潜在模型的共聚类扩展到混合数据的情况（连续和二进制变量）。然后，我们评估拟议方法对模拟数据的有效性，并讨论其优势和潜在限制。

Co-clustering is a data mining technique used to extract the underlying block structure between the rows and columns of a data matrix. Many approaches have been studied and have shown their capacity to extract such structures in continuous, binary or contingency tables. However, very little work has been done to perform co-clustering on mixed type data. In this article, we extend the latent block models based co-clustering to the case of mixed data (continuous and binary variables). We then evaluate the effectiveness of the proposed approach on simulated data and we discuss its advantages and potential limits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题