限制聚类和多个内核学习，而没有成对约束放松

论文标题

限制聚类和多个内核学习，而没有成对约束放松

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

论文作者

Boecking, Benedikt, Jeanselme, Vincent, Dubrawski, Artur

论文摘要

在成对约束下的聚类是一个重要的知识发现工具，它使学习适当的内核或距离指标能够改善聚类性能。这些成对的约束以必须链接和无法链接对的形式出现，在许多应用程序中自然出现，并且直观供用户提供。但是，放宽离散限制的常见实践是在学习内核或指标时会损害概括的连续域以简化优化，因为仅编码链接的信息将转换为通知距离。我们引入了一种新的约束聚类算法，该算法共同簇数据并根据可用的成对约束学习内核。为了很好地概括，我们的方法旨在最大程度地提高约束满意度，而不会放松成对的约束，以使其通知距离的连续域。我们表明，所提出的方法在大量可公开可用的数据集上优于现有方法，我们讨论了我们的方法如何扩展到处理大数据。

Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题