论文标题

分布约束:追逐分布数据

Distribution Constraints: The Chase for Distributed Data

论文作者

Geck, Gaetano, Neven, Frank, Schwentick, Thomas

论文摘要

本文介绍了一个声明的框架,以指定和理由关于分布式设置中计算节点的数据分布。更具体地说,它提出了分布约束,这些分布约束是元组和相等性生成依赖关系(TGD和EGD)扩展的,而节点变量范围在计算节点上。特别是,他们可以使用比较原子来表达有关基于范围的数据分布的共分配约束和约束。主要的技术贡献是研究分布约束的含义问题。尽管含义通常是不可决定的,但展示了所谓数据实现约束的相关片段,用于exptime,pspace和np的相应含义问题是完整的。在存在分布约束的情况下,这些结果在确定连接性查询的平行校正方面产生了界限。

This paper introduces a declarative framework to specify and reason about distributions of data over computing nodes in a distributed setting. More specifically, it proposes distribution constraints which are tuple and equality generating dependencies (tgds and egds) extended with node variables ranging over computing nodes. In particular, they can express co-partitioning constraints and constraints about range-based data distributions by using comparison atoms. The main technical contribution is the study of the implication problem of distribution constraints. While implication is undecidable in general, relevant fragments of so-called data-full constraints are exhibited for which the corresponding implication problems are complete for EXPTIME, PSPACE and NP. These results yield bounds on deciding parallel-correctness for conjunctive queries in the presence of distribution constraints.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源