鱿鱼：通过采样分位数估计更快的分析

论文标题

鱿鱼：通过采样分位数估计更快的分析

SQUID: Faster Analytics via Sampled Quantile Estimation

论文作者

Ben-Basat, Ran, Einziger, Gil, Han, Wenchen, Tayh, Bilal

论文摘要

流算法在分析大型和在线数据集中至关重要。许多这样的分析任务的关键组成部分是$ q $ -max，它在数字流中找到了最大的$ q $值。现代方法通过批量删除小物品并始终保留最大的$ Q $物品，从而实现了不断的运行时。但是，这些方法是通过昂贵的分位数计算来瓶颈的。这项工作引入了一种称为squid的分位采样方法，并在多个分析任务中显示了其优势。使用这种方法，我们设计了一种新型的加权重型击球手数据结构，比现有替代方案更快，更准确。我们还展示了Squid的实用性，该实用性是通过基于硬件的高速缓存原型来改善网络辅助的缓存系统，该缓存原型使用Squid实施缓存策略。这里的挑战在于，开关的数据平台不允许实施许多缓存策略所需的一般计算，而其CPU则速度较慢。我们通过将鱿鱼的样本传递到CPU来克服这个问题，从而弥合了这一差距。在软件实施中，我们表明使用实际工作负载时，我们的方法比最先进的替代方案快6.6倍。对于基于开关的缓存，Squid可实现大量基于数据平面的缓存策略，并达到比最新的P4LRU更高的命中率。

Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is $q$-MAX, which finds the largest $q$ values in a number stream. Modern approaches attain a constant runtime by removing small items in bulk and retaining the largest $q$ items at all times. Yet, these approaches are bottlenecked by an expensive quantile calculation. This work introduces a quantile-sampling approach called SQUID and shows its benefits in multiple analytic tasks. Using this approach, we design a novel weighted heavy hitters data structure that is faster and more accurate than the existing alternatives. We also show SQUID's practicality for improving network-assisted caching systems with a hardware-based cache prototype that uses SQUID to implement the cache policy. The challenge here is that the switch's dataplane does not allow the general computation required to implement many cache policies, while its CPU is orders of magnitude slower. We overcome this issue by passing just SQUID's samples to the CPU, thus bridging this gap. In software implementations, we show that our method is up to 6.6x faster than the state-of-the-art alternatives when using real workloads. For switch-based caching, SQUID enables a wide spectrum of data-plane-based caching policies and achieves higher hit ratios than the state-of-the-art P4LRU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题