论文标题
内核两样本和非平稳随机过程的独立性测试
Kernel Two-Sample and Independence Tests for Non-Stationary Random Processes
论文作者
论文摘要
基于内核的MMD和HSIC的两样本和独立测试对I.I.D.数据和固定随机过程。但是,这些统计数据并不直接适用于非平稳随机过程,这是许多科学学科中普遍的数据形式。在这项工作中,我们通过假设对基础随机过程的独立实现来扩展MMD和HSIC的应用到非平稳设置。这些实现 - 以同一时间网格测量的非平稳时间序列的形式 - 然后可以将其视为I.I.D.来自多元概率分布的样本可以应用MMD和HSIC。我们进一步展示了如何通过最大化估计的测试能力相对于内核超参数来选择合适的内核。在综合数据的实验中,我们证明了与当前最新功能或多元两样本和独立性测试相比,我们提出的方法在测试能力方面表现出色。最后,我们将我们的方法应用于真实的社会经济数据集作为示例应用程序。
Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of MMD and HSIC to non-stationary settings by assuming access to independent realisations of the underlying random process. These realisations - in the form of non-stationary time-series measured on the same temporal grid - can then be viewed as i.i.d. samples from a multivariate probability distribution, to which MMD and HSIC can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyper-parameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socio-economic dataset as an example application.