论文标题

使用复制来调整分布式查询的尾部潜伏期

Tuning the Tail Latency of Distributed Queries Using Replication

论文作者

Ng, Nathan, Le, Hung, Serafini, Marco

论文摘要

在社交网络和知识图等应用领域中,查询具有低延迟的图形数据是一个重要的要求。图查询在顶点之间执行多个跳动。当数据分区并跨多个服务器存储时,在一台服务器上执行的查询通常需要跳到另一台服务器存储的顶点。这种分布式遍历代表了低延迟查询的性能瓶颈。为了减少查询延迟,可以复制远程数据以使分布式遍历不必要,但是复制很昂贵,应最小化。在本文中,我们介绍了查找数据复制方案的问题,这些数据复制方案满足任意用户定义的查询延迟约束,并以最小的复制成本。我们提出了一个新型的工作量模型,以表达数据访问因果关系,提出启发式方法,并为其正确性带来非平凡的足够条件。我们对两个代表性基准的评估表明,我们的算法可以通过数据复制来进行微调查询延迟,并可以在延迟/复制设计空间中找到甜蜜的斑点。

Querying graph data with low latency is an important requirement in application domains such as social networks and knowledge graphs. Graph queries perform multiple hops between vertices. When data is partitioned and stored across multiple servers, queries executing at one server often need to hop to vertices stored by another server. Such distributed traversals represent a performance bottleneck for low-latency queries. To reduce query latency, one can replicate remote data to make distributed traversals unnecessary, but replication is expensive and should be minimized. In this paper, we introduce the problem of finding data replication schemes that satisfy arbitrary user-defined query latency constraints with minimal replication cost. We propose a novel workload model to express data access causality, propose a family of heuristics, and introduce non-trivial sufficient conditions for their correctness. Our evaluation on two representative benchmarks show that our algorithms enable fine-tuning query latency with data replication and can find sweet spots in the latency/replication design space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源