论文标题
重尾平均估计的CATONI风格的置信序列
Catoni-style confidence sequences for heavy-tailed mean estimation
论文作者
论文摘要
置信序列(CS)是一系列置信区间,在任意数据依赖性停止时间下有效。这些在A/B测试,多臂匪徒,非政策评估,选举审计等的应用中很有用。我们提供了三种方法,在最小的假设下,仅知道该方差上只有上限$σ^2 $的最小假设。尽管以前的作品依赖于诸如界限或高仪性(在存在分布的所有时刻)之类的轻尾假设中,但我们工作中的置信序列能够处理来自各种重型分布的数据。在我们的三种方法中,最好的是CATONI风格的置信序列 - 在实践中表现出色,与$σ^2 $ -Subgaussian数据的最新方法匹配,并且可证明$ \ sqrt {\ log \ log \ log \ log \ log \ log \ log \ t/t/t/t/t/t/t/t} $较低限制,这是由于it ipeered logarith的法律所致。我们的发现对未约束观测值的顺序实验具有重要意义,因为$σ^2 $ bounded-bounded-niber-niberancians假设比$σ^2 $ -subgaussian更现实,更易于验证(这意味着前者)。我们还将方法扩展到具有无限差异的数据,但具有$ p $ -th的中央时刻($ 1 <p <2 $)。
A confidence sequence (CS) is a sequence of confidence intervals that is valid at arbitrary data-dependent stopping times. These are useful in applications like A/B testing, multi-armed bandits, off-policy evaluation, election auditing, etc. We present three approaches to constructing a confidence sequence for the population mean, under the minimal assumption that only an upper bound $σ^2$ on the variance is known. While previous works rely on light-tail assumptions like boundedness or subGaussianity (under which all moments of a distribution exist), the confidence sequences in our work are able to handle data from a wide range of heavy-tailed distributions. The best among our three methods -- the Catoni-style confidence sequence -- performs remarkably well in practice, essentially matching the state-of-the-art methods for $σ^2$-subGaussian data, and provably attains the $\sqrt{\log \log t/t}$ lower bound due to the law of the iterated logarithm. Our findings have important implications for sequential experimentation with unbounded observations, since the $σ^2$-bounded-variance assumption is more realistic and easier to verify than $σ^2$-subGaussianity (which implies the former). We also extend our methods to data with infinite variance, but having $p$-th central moment ($1<p<2$).