论文标题
对预测的加入查询的列举
Ranked Enumeration of Join Queries with Projections
论文作者
论文摘要
使用订购加入查询评估是关系数据库管理系统中的基本数据处理任务。 SQL和自定义图形查询语言(例如Cypher)通过允许用户通过子句指定订单来提供此功能。在许多情况下,用户还希望快速看到第一个$ k $结果(由限制条款表示),但是$ k $的值并未预定,因为用户查询以在线方式到达。最近的工作在确定不包含任何预测的联接查询列举的最佳算法方面取得了长足的进步。在本文中,我们启动了针对预测的查询的排名顺序列举结果的问题。我们的主要结果表明,对于任何无环查询,只有在两个重要排名函数的线性时间预处理步骤之后,就可以获得接近线性(按数据库的大小)延迟算法:总和和词典顺序。对于被称为星形查询的无环查询的实际子集,我们显示出更强的结果,允许用户在更快的回答时间之间使用更多的预处理时间获得平稳的权衡。我们的结果也可扩展到包含周期和工会的查询。我们还执行了全面的实验评估,以证明我们的算法易于实施,在运行时间内,在开源RDBMS中实现的最新算法和专用图形数据库中实现了三个数量级。
Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first $k$ results quickly (expressed by the LIMIT clause), but the value of $k$ is not predetermined as user queries are arriving in an online fashion. Recent work has made considerable progress in identifying optimal algorithms for ranked enumeration of join queries that do not contain any projections. In this paper, we initiate the study of the problem of enumerating results in ranked order for queries with projections. Our main result shows that for any acyclic query, it is possible to obtain a near-linear (in the size of the database) delay algorithm after only a linear time preprocessing step for two important ranking functions: sum and lexicographic ordering. For a practical subset of acyclic queries known as star queries, we show an even stronger result that allows a user to obtain a smooth tradeoff between faster answering time guarantees using more preprocessing time. Our results are also extensible to queries containing cycles and unions. We also perform a comprehensive experimental evaluation to demonstrate that our algorithms, which are simple to implement, improve up to three orders of magnitude in the running time over state-of-the-art algorithms implemented within open-source RDBMS and specialized graph databases.