论文标题
全面有效的工作量压缩
Comprehensive and Efficient Workload Compression
论文作者
论文摘要
这项工作研究了从给定的输入分析查询工作量构建代表性工作量的问题,前者可以用作后者的保证。我们在工作负载分析和监视的背景下讨论我们的工作。例如,数据库系统中不断发展的系统用法模式可能会导致负载不平衡和性能回归,可以通过监视系统使用模式(即〜代表工作负载)来控制这些模式。要以原则上的方式构造这样的工作负载,我们将工作负载{\ em代表性}和{\ em Coverage}的概念形式化。这些指标捕获了这样的直觉,即压缩工作负载中特征的分布应与目标分布相匹配,提高代表性,并包括常见的查询以及异常值,从而增加覆盖范围。我们表明,最佳地解决此问题是NP-HARD,并提出了一种提供近似保证的新型贪婪算法。我们将我们的技术与在此问题空间(例如采样和聚类)中建立的算法进行比较,并展示了我们技术的优势和关键权衡。
This work studies the problem of constructing a representative workload from a given input analytical query workload where the former serves as an approximation with guarantees of the latter. We discuss our work in the context of workload analysis and monitoring. As an example, evolving system usage patterns in a database system can cause load imbalance and performance regressions which can be controlled by monitoring system usage patterns, i.e.,~a representative workload, over time. To construct such a workload in a principled manner, we formalize the notions of workload {\em representativity} and {\em coverage}. These metrics capture the intuition that the distribution of features in a compressed workload should match a target distribution, increasing representativity, and include common queries as well as outliers, increasing coverage. We show that solving this problem optimally is NP-hard and present a novel greedy algorithm that provides approximation guarantees. We compare our techniques to established algorithms in this problem space such as sampling and clustering, and demonstrate advantages and key trade-offs of our techniques.