论文标题
基于机器学习的统计假设测试:大偏差分析
Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis
论文作者
论文摘要
我们研究了机器学习(ML)分类技术的误差概率收敛到零的速率的性能。 Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate.这些条件取决于数据驱动的决策功能的累积产生功能的Fenchel-Legendre变换(D3F,即,在做出最终二进制决策之前的阈值)中学到了训练阶段。因此,D3F以及相关的错误率$ $ $取决于给定的训练集,该集合假定有限。有趣的是,可以根据基础统计模型上的可用信息生成的可用数据集或合成数据集对这些条件进行数字利用和测试。换句话说,分类错误概率收敛到零,其速率可以在可用于培训的数据集的一部分上计算。与大偏差理论一致,我们还可以建立足够大的汇总D3F统计量的融合到高斯分布。利用此属性设置所需的渐近误报概率,从经验上来说,即使对于$ n $的非常现实的值,该属性也是准确的。此外,提供了近似误差概率曲线$ \simζ_n\ exp \ left(-n \,i \右)$,这要归功于精制的渐近导数(通常称为准确的渐近学),其中$ζ_N$代表了误差概率的最具代表性的子指数项。
We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and, consequently, the related error rate $I$, depend on the given training set, which is assumed of finite size. Interestingly, these conditions can be verified and tested numerically exploiting the available dataset, or a synthetic dataset, generated according to the available information on the underlying statistical model. In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training. Coherently with the large deviations theory, we can also establish the convergence, for $n$ large enough, of the normalized D3F statistic to a Gaussian distribution. This property is exploited to set a desired asymptotic false alarm probability, which empirically turns out to be accurate even for quite realistic values of $n$. Furthermore, approximate error probability curves $\sim ζ_n \exp\left(-n\,I \right)$ are provided, thanks to the refined asymptotic derivation (often referred to as exact asymptotics), where $ζ_n$ represents the most representative sub-exponential terms of the error probabilities.