论文标题
Lincqa:通过线性时间保证更快的始终查询回答
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees
论文作者
论文摘要
大多数数据分析管道通常会遇到违反预定的完整性约束的不一致数据的问题。数据清洁是一个经过广泛研究的范式,可以单击对数据不一致的一致修复。一致的查询答录(CQA)是一种替代数据清洁的方法,要求所有元素都通过给定查询返回的所有元素(在大多数情况下(大多数情况下)维修不一致的数据。本文识别一类无环的选择项目加入(SPJ)查询,可以通过线性时间保证通过SQL重写解决CQA。我们的重写方法可以看作是对Yannakakis的算法的概括,用于无环的连接到不一致的设置。我们提出Lincqa,该系统可以在此类中的每个查询中输出SQL和非收回数据编则中的重写。我们表明,林卡通常在合成和现实工作负载上以及在某些情况下按数量级来优于现有的CQA系统。
Most data analytical pipelines often encounter the problem of querying inconsistent data that violate pre-determined integrity constraints. Data cleaning is an extensively studied paradigm that singles out a consistent repair of the inconsistent data. Consistent query answering (CQA) is an alternative approach to data cleaning that asks for all tuples guaranteed to be returned by a given query on all (in most cases, exponentially many) repairs of the inconsistent data. This paper identifies a class of acyclic select-project-join (SPJ) queries for which CQA can be solved via SQL rewriting with a linear time guarantee. Our rewriting method can be viewed as a generalization of Yannakakis's algorithm for acyclic joins to the inconsistent setting. We present LinCQA, a system that can output rewritings in both SQL and non-recursive Datalog rules for every query in this class. We show that LinCQA often outperforms the existing CQA systems on both synthetic and real-world workloads, and in some cases, by orders of magnitude.