论文标题

大规模幂律图上的以子图编程模型的高效且平衡的图形分区算法

An Efficient and Balanced Graph Partition Algorithm for the Subgraph-Centric Programming Model on Large-scale Power-law Graphs

论文作者

Zhang, Shuai, Jiang, Zite, Hou, Xingzhong, Guan, Zhen, Yuan, Mengting, You, Haihang

论文摘要

以子图为中心的编程模型是一种有前途的方法,并且已应用于许多最新的分布式图计算框架中。但是,传统的图形分区算法在处理大规模幂律图上存在很大困难。主要问题是在许多以子图中的框架中发现的通信瓶颈。详细的分析表明,通信瓶颈是由巨大的通信量或分区子图之间的极端信息失衡引起的。传统分区算法并非同时考虑这两个因素,尤其是在幂律图上。 在本文中,我们提出了一种新型的高效且平衡的顶点示意图分区算法(EBV),该算法将适当的权重授予整体通信成本和沟通平衡。我们观察到,复制的顶点的数量以及边缘和顶点分配的平衡对分布式亚仪表框架的通信模式产生了很大的影响,这进一步影响了整体性能。基于此洞察力,我们设计了一个评估功能,该函数量化了复制顶点的比例以及边缘和顶点分配的平衡作为重要参数。此外,我们通过从小到大的端孔的度量总和来对边缘处理的顺序进行排序。实验表明,与其他基于自我的分区算法相比,EBV将复制因子和通信分别降低至少21.8%和23.7%。与最新的分区算法相比,当部署在以亚图中为中心的框架中时,幂律图上的运行时间平均减少了16.8%。我们的结果表明,EBV在改善以平行大规模幂律图处理的子图框架的性能方面具有很大的潜力。

The subgraph-centric programming model is a promising approach and has been applied in many state-of-the-art distributed graph computing frameworks. However, traditional graph partition algorithms have significant difficulties in processing large-scale power-law graphs. The major problem is the communication bottleneck found in many subgraph-centric frameworks. Detailed analysis indicates that the communication bottleneck is caused by the huge communication volume or the extreme message imbalance among partitioned subgraphs. The traditional partition algorithms do not consider both factors at the same time, especially on power-law graphs. In this paper, we propose a novel efficient and balanced vertex-cut graph partition algorithm (EBV) which grants appropriate weights to the overall communication cost and communication balance. We observe that the number of replicated vertices and the balance of edge and vertex assignment have a great influence on communication patterns of distributed subgraph-centric frameworks, which further affect the overall performance. Based on this insight, We design an evaluation function that quantifies the proportion of replicated vertices and the balance of edges and vertices assignments as important parameters. Besides, we sort the order of edge processing by the sum of end-vertices' degrees from small to large. Experiments show that EBV reduces replication factor and communication by at least 21.8% and 23.7% respectively than other self-based partition algorithms. When deployed in the subgraph-centric framework, it reduces the running time on power-law graphs by an average of 16.8% compared with the state-of-the-art partition algorithm. Our results indicate that EBV has a great potential in improving the performance of subgraph-centric frameworks for the parallel large-scale power-law graph processing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源