论文标题
图形距离和聚类
Graph Distances and Clustering
论文作者
论文摘要
通过在图形聚类上的视图,我们提出了基于共享连接性的顶点到vertex距离的定义。我们认为,共享更多连接的顶点比共享连接更少的顶点更接近彼此。我们的论文集中在广泛接受的观念上,即强簇是由高水平的诱导子图密度形成的,其中亚图代表簇。我们认为这些群集是通过对其连接性相似的顶点进行分组来形成的。在群集水平(诱导子图水平)下,我们的论文转化为平均簇内距离较低。我们的定义与通常的最短路径大地距离有所不同。在本文中,我们比较了文献中的三个距离测量。我们的基准是在集群级别汇总(平均)时,每个度量反射集群内密度的准确性。我们对使用种植的分区模型生成的合成图进行了测试,其中簇和集群内密度已预先知道。我们检查平均群内距离和群内密度之间的相关性。我们的数值实验表明,当在簇内的顶点对上平均时,Jaccard和Otsuka-ochiai提供了非常准确的密度度量。
With a view on graph clustering, we present a definition of vertex-to-vertex distance which is based on shared connectivity. We argue that vertices sharing more connections are closer to each other than vertices sharing fewer connections. Our thesis is centered on the widely accepted notion that strong clusters are formed by high levels of induced subgraph density, where subgraphs represent clusters. We argue these clusters are formed by grouping vertices deemed to be similar in their connectivity. At the cluster level (induced subgraph level), our thesis translates into low mean intra-cluster distances. Our definition differs from the usual shortest-path geodesic distance. In this article, we compare three distance measures from the literature. Our benchmark is the accuracy of each measure's reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs generated using the planted partition model, where clusters and intra-cluster density are known in advance. We examine correlations between mean intra-cluster distances and intra-cluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.