论文标题

$ L_P $多项式回归的近线性样本复杂性

Near-Linear Sample Complexity for $L_p$ Polynomial Regression

论文作者

Meyer, Raphael A., Musco, Cameron, Musco, Christopher, Woodruff, David P., Zhou, Samson

论文摘要

我们研究$ L_P $多项式回归。给定查询访问函数$ f:[ - 1,1] \ rightarrow \ mathbb {r} $,目标是找到一个$ d $ d $ polyenmial $ \ hat {q} $,这样,对于给定参数$ \ varepsilon> 0 \ min_ {q:\ text {deg}(q)\ le d} \ | q-f \ | _p。 $$这里$ \ | \ cdot \ | _p $是$ l_p $ norm,$ \ | g \ | _p =(\ int _ { - 1}^1 | g(t)| g(t)|^p dt)^p dt)^{1/p} $。我们表明,以$ [-1,1] $从Chebyshev量度随机提取的点上查询$ f $是在所有$ L_P $ NORMS中的多项式回归的近乎最佳策略。特别是,要找到$ \ hat q $,就可以采样$ o(d \,\ frac {\ frac {\ polylog} \,d} {\ text {poly} \,\ \ varepsilon})$ points $ [ - 1,1] $具有此措施的概率比例。尽管以$ l_2 $和$ l_ \ infty $为$ l_2 $和$ l_ \ infty $的多项式回归的最佳样本复杂性,但我们的结果是第一个以$ d $和错误$(1+ \ varepsilon)$($ p $)的$ d $和错误$(1+ \ varepsilon)$实现样本复杂性的结果。 我们的结果需要两个主要的技术贡献。第一个问题是$ p \ leq 2 $,我们为此提供了无限线性运算符的$ L_P $ LEWIS权重功能的明确界限。使用正交多项式文献中的工具,我们表明此功能受Chebyshev密度的界定。我们的第二个关键贡献是利用多项式结构,将$ p> 2 $的情况减少到$ p \ leq 2 $案例。通过这样做,我们获得的样本复杂性比一般$ p $ -Norm线性回归问题的可能性更好,为此,需要$ω(d^{p/2})$样本。

We study $L_p$ polynomial regression. Given query access to a function $f:[-1,1] \rightarrow \mathbb{R}$, the goal is to find a degree $d$ polynomial $\hat{q}$ such that, for a given parameter $\varepsilon > 0$, $$ \|\hat{q}-f\|_p\le (1+\varepsilon) \cdot \min_{q:\text{deg}(q)\le d}\|q-f\|_p. $$ Here $\|\cdot\|_p$ is the $L_p$ norm, $\|g\|_p = (\int_{-1}^1 |g(t)|^p dt)^{1/p}$. We show that querying $f$ at points randomly drawn from the Chebyshev measure on $[-1,1]$ is a near-optimal strategy for polynomial regression in all $L_p$ norms. In particular, to find $\hat q$, it suffices to sample $O(d\, \frac{\text{polylog}\,d}{\text{poly}\,\varepsilon})$ points from $[-1,1]$ with probabilities proportional to this measure. While the optimal sample complexity for polynomial regression was well understood for $L_2$ and $L_\infty$, our result is the first that achieves sample complexity linear in $d$ and error $(1+\varepsilon)$ for other values of $p$ without any assumptions. Our result requires two main technical contributions. The first concerns $p\leq 2$, for which we provide explicit bounds on the $L_p$ Lewis weight function of the infinite linear operator underlying polynomial regression. Using tools from the orthogonal polynomial literature, we show that this function is bounded by the Chebyshev density. Our second key contribution is to take advantage of the structure of polynomials to reduce the $p>2$ case to the $p\leq 2$ case. By doing so, we obtain a better sample complexity than what is possible for general $p$-norm linear regression problems, for which $Ω(d^{p/2})$ samples are required.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源