论文标题

具有密度矩阵和随机傅立叶特征的快速内核密度估计

Fast Kernel Density Estimation with Density Matrices and Random Fourier Features

论文作者

Gallego, Joseph A., Osorio, Juan F., González, Fabio A.

论文摘要

核密度估计(KDE)是使用最广泛的非参数密度估计方法之一。它是一种基于内存的方法,即它将整个培训数据集用于预测,这使其不适合大多数当前的大数据应用程序。已经提出了几种策略,例如基于树或基于哈希的估计器,以提高内核密度估计方法的效率。新型密度内核密度估计方法(DMKDE)使用密度矩阵,量子机械形式主义和随机傅立叶特征(显式内核近似)来产生密度估计。该方法的根源在KDE中,可以被视为近似方法,而无需基于内存的限制。在本文中,我们系统地评估了新型DMKDE算法,并将其与其他最新的快速程序进行比较,以近似于不同合成数据集的内核密度估计方法。我们的实验结果表明,DMKDE与其竞争对手相当,用于计算密度估计值和优势,在高维数据上进行。我们将所有代码作为开源软件存储库提供。

Kernel density estimation (KDE) is one of the most widely used nonparametric density estimation methods. The fact that it is a memory-based method, i.e., it uses the entire training data set for prediction, makes it unsuitable for most current big data applications. Several strategies, such as tree-based or hashing-based estimators, have been proposed to improve the efficiency of the kernel density estimation method. The novel density kernel density estimation method (DMKDE) uses density matrices, a quantum mechanical formalism, and random Fourier features, an explicit kernel approximation, to produce density estimates. This method has its roots in the KDE and can be considered as an approximation method, without its memory-based restriction. In this paper, we systematically evaluate the novel DMKDE algorithm and compare it with other state-of-the-art fast procedures for approximating the kernel density estimation method on different synthetic data sets. Our experimental results show that DMKDE is on par with its competitors for computing density estimates and advantages are shown when performed on high-dimensional data. We have made all the code available as an open source software repository.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源