论文标题

在旋转门流上的非自适应自适应抽样

Non-Adaptive Adaptive Sampling on Turnstile Streams

论文作者

Mahabadi, Sepideh, Razenshteyn, Ilya, Woodruff, David P., Zhou, Samson

论文摘要

自适应采样是在经典集中设置中用于数据汇总问题的有用算法工具,其中整个数据集可用于执行计算的单个处理器。自适应采样反复选择基础矩阵$ \ mathbf {a} \ in \ mathbb {r}^{n \ times d} $的行,其中$ n \ gg d $,其中概率与它们与先前选择的rows子空间的距离成比例。在直觉上,自适应采样似乎仅限于流式计算模型中的琐碎多通算法,这是因为仅在上一个迭代完成后才向每一行分配采样概率的固有顺序性质。令人惊讶的是,我们通过提供第一个单通算法来表明这种情况并非如此,用于在旋转栅流中进行自适应采样,并使用Space $ \ text {poly}(d,k,k,\ log n)$,其中$ k $是自适应抽样的数量来执行。 我们的自适应采样程序在各种数据汇总问题上都有许多应用程序,这些问题要么改善了最先进的问题,要么仅在更轻松的行排式模型中研究过。我们提供了第一个用于列子集选择,子空间近似,投影聚类和量最大化的相对误差算法,这些算法最大化,在$ n $中使用Space Sublinear。我们以较低到较低级项的下限,即使对于多通算法,我们的体积最大化算法结果补充了最大化算法。通过类似的结构,我们还获得了行界模型中的体积最大化的下限,我们与竞争性上限匹配。 请参阅纸张以获取完整摘要。

Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly selects rows of an underlying matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$, where $n\gg d$, with probabilities proportional to their distances to the subspace of the previously selected rows. Intuitively, adaptive sampling seems to be limited to trivial multi-pass algorithms in the streaming model of computation due to its inherently sequential nature of assigning sampling probabilities to each row only after the previous iteration is completed. Surprisingly, we show this is not the case by giving the first one-pass algorithms for adaptive sampling on turnstile streams and using space $\text{poly}(d,k,\log n)$, where $k$ is the number of adaptive sampling rounds to be performed. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model. We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume maximization on turnstile streams that use space sublinear in $n$. We complement our volume maximization algorithmic results with lower bounds that are tight up to lower order terms, even for multi-pass algorithms. By a similar construction, we also obtain lower bounds for volume maximization in the row-arrival model, which we match with competitive upper bounds. See paper for full abstract.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源