论文标题
最佳加入算法会遇到TOP-K
Optimal Join Algorithms Meet Top-k
论文作者
论文摘要
在数据库社区中,对Top-K查询进行了深入研究,它们是仅需要“最佳”或“最有趣”结果而不是全部输出时,它们是降低查询成本的重要手段。尽管存在一些最佳结果,例如著名的阈值算法,但它们仅在相当有限的计算模型中持有,该模型不考虑大型中间结果所产生的成本,因此与典型的数据库 - 射击器成本模型不符。另一方面,避免大型中间结果的想法可以说是最佳连接算法的最新工作的主要目标,该算法使用标准的计算模型来确定算法复杂性。这项研究引起了很多兴奋,因为它承诺减少与周期的加入查询的时间复杂性,但主要集中在全输出计算上。我们认为,这两个领域可以从统一的角度进行研究,以便在通用一般类型的顶级连接查询的通用计算模型中实现最佳性。该教程有两个主要目标。首先,我们将探索和对比这两个研究领域的主要假设,概念和算法成就。其次,我们将介绍最近以及一些较旧的方法,这些方法是在交叉路口出现的,以支持有效的结合结果列举。这些与K-Shortest Path算法的经典作品和更一般的优化问题有关,其中一些可以追溯到1950年代。我们证明,这一研究需要在一般列举的艰巨背景下对一般联接查询的艰巨背景进行重新关注。
Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.