论文标题
复制核和新方法的组成数据分析
Reproducing Kernels and New Approaches in Compositional Data Analysis
论文作者
论文摘要
组成数据,例如人类肠道微生物组,由非阴性变量组成,其仅可用的相对值可用。分析人类肠道微生物组等组成数据需要仔细处理数据的几何形状。对组成数据的常见几何理解是通过常规的单纯形。大多数现有方法都依赖于对数比率或功率转换来克服先天的简单几何形状。在这项工作中,基于一个关键观察,即组成数据本质上是投影性的,并且在投影和球形几何形状之间的内在联系上,我们将组成域重新解释为通过小组动作对球体进行修改的商的拓扑。这种重新解释使我们能够以球体和球形谐波理论以及反射组的作用以及用于构建构图重现的核心核心kernel hilbert Space(RKHS)的反射组动作以及使用球形谐波理论以及使用球形谐波理论的功能空间。用于组成数据的RKHS的构建将为未来的方法论开发广泛开放研究途径。特别是,发达的内核嵌入方法现在可以引入组成数据分析。组成RKHS的多项式性质具有理论和计算益处。所提出的理论框架的广泛适用性用非参数密度估计和内核指数族的组成数据举例说明。
Compositional data, such as human gut microbiomes, consist of non-negative variables whose only the relative values to other variables are available. Analyzing compositional data such as human gut microbiomes needs a careful treatment of the geometry of the data. A common geometrical understanding of compositional data is via a regular simplex. Majority of existing approaches rely on a log-ratio or power transformations to overcome the innate simplicial geometry. In this work, based on the key observation that a compositional data are projective in nature, and on the intrinsic connection between projective and spherical geometry, we re-interpret the compositional domain as the quotient topology of a sphere modded out by a group action. This re-interpretation allows us to understand the function space on compositional domains in terms of that on spheres and to use spherical harmonics theory along with reflection group actions for constructing a compositional Reproducing Kernel Hilbert Space (RKHS). This construction of RKHS for compositional data will widely open research avenues for future methodology developments. In particular, well-developed kernel embedding methods can be now introduced to compositional data analysis. The polynomial nature of compositional RKHS has both theoretical and computational benefits. The wide applicability of the proposed theoretical framework is exemplified with nonparametric density estimation and kernel exponential family for compositional data.