论文标题

2PS:用两相流的高质量边缘分区

2PS: High-Quality Edge Partitioning with Two-Phase Streaming

论文作者

Mayer, Ruben, Orujzade, Kamil, Jacobsen, Hans-Arno

论文摘要

图形分区是分布式图处理的重要预处理步骤。在边缘分区中,给定图的边缘集分为$ k $均等的分区,以便最小化跨分区的顶点的复制。流是超过单个服务器内存能力的分区图的可行方法。该图被摄入为边流,一次,一个边缘立即根据评分函数分配给分区。但是,流分区遇到了不知情的分配问题:在流中的早期边缘分区时,没有有关其余边缘的信息。结果,边缘分配通常是由平衡考虑的驱动,而所实现的复制因子也相当高。在本文中,我们提出了2PS,这是一种用于高质量边缘分区的新型两相流算法。在第一阶段,将顶点通过轻巧的流聚类算法分为簇。在第二阶段,将图形重新流程,并考虑到第一阶段的顶点的聚类,并执行边缘分区。我们的评估表明,2PS可以实现与重量随机访问分区者相媲美的复制因子,同时诱导下降较低的内存开销。

Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源