论文标题
具有高概率的离散分布的最佳测试
Optimal Testing of Discrete Distributions with High Probability
论文作者
论文摘要
我们研究了测试离散分布的问题,重点是高概率制度。具体来说,给定一个或多个离散分布的样本,属性$ \ Mathcal {p} $和参数$ 0 <ε,Δ<1 $,我们希望以可能以$1-Δ$}的概率区分{\ em $1-Δ$},这些分布是否满足$ \ \ m natercal {p} $ $ $ $ $ - $ - $ - $ - $ friatie $ \ n $ \ n $ \ n $ \ n $ \ n $ \ n $ \ n pp riaties。大多数先前的分配测试工作都研究了恒定的置信案例(对应于$δ=ω(1)$),并为一系列属性提供了最佳的样品测试仪。虽然始终可以通过黑盒放大来提高任何此类测试仪的置信度概率,但这种通用的增强方法通常会导致亚最佳样本边界。 在这里,我们研究以下广泛的问题:对于给定的属性$ \ MATHCAL {P} $,我们可以{\ em表征}测试$ \ Mathcal {p} $的样本复杂性作为所有相关问题参数的函数,包括错误概率$δ$吗?在这项工作之前,统一测试是唯一在这种情况下表征样本复杂性的唯一统计任务。作为我们的主要结果,我们提供了第一个算法,以作为所有相关参数的函数在恒定因素内,在恒定因素内,在恒定因子内是最佳的独立性测试。我们还在这些问题的样本复杂性上显示了匹配的信息理论下限。我们的技术自然会扩展到相关问题的最佳测试人员。为了说明我们方法的一般性,我们提供了最佳的算法,用于测试分布的集合并使用不等大小的样本测试接近度。
We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< ε, δ<1$, we want to distinguish {\em with probability at least $1-δ$} whether these distributions satisfy $\mathcal{P}$ or are $ε$-far from $\mathcal{P}$ in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to $δ= Ω(1)$), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property $\mathcal{P}$, can we {\em characterize} the sample complexity of testing $\mathcal{P}$ as a function of all relevant problem parameters, including the error probability $δ$? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.