论文标题

规模规范下最大化

Regularized Submodular Maximization at Scale

论文作者

Kazemi, Ehsan, Minaee, Shervin, Feldman, Moran, Karbasi, Amin

论文摘要

在本文中,我们提出了可扩展的方法,用于最大化正则化函数$ f = g- \ ell $表示为单调函数$ g $与模块化函数$ \ ell $之间的差异。实际上,义能与多样性,覆盖范围和代表性的概念固有有关。特别是,找到许多流行的多样性概率模型的模式,例如确定点过程,下概率模型以及强烈的对数符号分布,涉及最大化(正则化)下函数。由于正规函数$ f $可能会带来负值,因此可能不适用非负性假设的经典下管最大化理论,它可能不适用。为了避免这一挑战,我们开发了第一个单通道流算法,以最大程度地提高正规次数函数,但受到$ K $ - 心脏的约束。它返回解决方案$ s $,并保证$ f(s)\ geq(ϕ^{ - 2}-ε)\ cdot g(opt) - \ ell(opt)$,其中$ ϕ $是黄金比率。 Furthermore, we develop the first distributed algorithm that returns a solution $S$ with the guarantee that $\mathbb{E}[f(S)] \geq (1-ε) [(1-e^{-1}) \cdot g(OPT)-\ell(OPT)]$ in $O(1/ ε)$ rounds of MapReduce computation, without keeping multiple copies of the entire dataset in each round (通常是这样做的)。我们应该强调的是,即使对于模块化术语$ \ ell $为零的未注册情况,我们的结果也将现有作品的内存和通信复杂性提高了$ O(1/ε)$,而可以说出更简单的分布式算法和统一分析。我们还从经验上研究了可伸缩方法在一组现实生活中的性能,包括找到分布方式,数据摘要和产品建议。

In this paper, we propose scalable methods for maximizing a regularized submodular function $f = g - \ell$ expressed as the difference between a monotone submodular function $g$ and a modular function $\ell$. Indeed, submodularity is inherently related to the notions of diversity, coverage, and representativeness. In particular, finding the mode of many popular probabilistic models of diversity, such as determinantal point processes, submodular probabilistic models, and strongly log-concave distributions, involves maximization of (regularized) submodular functions. Since a regularized function $f$ can potentially take on negative values, the classic theory of submodular maximization, which heavily relies on the non-negativity assumption of submodular functions, may not be applicable. To circumvent this challenge, we develop the first one-pass streaming algorithm for maximizing a regularized submodular function subject to a $k$-cardinality constraint. It returns a solution $S$ with the guarantee that $f(S)\geq(ϕ^{-2}-ε) \cdot g(OPT)-\ell (OPT)$, where $ϕ$ is the golden ratio. Furthermore, we develop the first distributed algorithm that returns a solution $S$ with the guarantee that $\mathbb{E}[f(S)] \geq (1-ε) [(1-e^{-1}) \cdot g(OPT)-\ell(OPT)]$ in $O(1/ ε)$ rounds of MapReduce computation, without keeping multiple copies of the entire dataset in each round (as it is usually done). We should highlight that our result, even for the unregularized case where the modular term $\ell$ is zero, improves the memory and communication complexity of the existing work by a factor of $O(1/ ε)$ while arguably provides a simpler distributed algorithm and a unifying analysis. We also empirically study the performance of our scalable methods on a set of real-life applications, including finding the mode of distributions, data summarization, and product recommendation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源