论文标题
使用预计的瓦斯坦距离的两样本测试
Two-sample Test using Projected Wasserstein Distance
论文作者
论文摘要
我们为两样本测试开发了一个预计的Wasserstein距离,这是统计和机器学习中的一个基本问题:给定两组样本,以确定它们是否来自相同的分布。特别是,我们旨在规避Wasserstein距离中维数的诅咒:当尺寸高时,它的测试能力会降低,这是固有的,这是由于高维空间中Wasserstein指标的较慢浓度属性所致。一个关键的贡献是夫妇最佳投影以找到低维线性映射,以最大程度地提高投影概率分布之间的Wasserstein距离。我们表征了有限样本收敛速率在IPM上的理论特性,并提出了计算此指标的实用算法。数值示例验证了我们的理论结果。
We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize the theoretical property of the finite-sample convergence rate on IPMs and present practical algorithms for computing this metric. Numerical examples validate our theoretical results.