论文标题
洞还是谷物?在多个维度上找到隐藏结构的部分追求指数
Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions
论文作者
论文摘要
多元数据通常使用线性投影可视化,该技术由主体分析,线性判别分析和投影追求等技术产生。预测的一个问题是它们掩盖了分布中心附近的低密度区域。部分或切片可以帮助揭示它们。本文开发了一种截面的追求方法,基于投影追求的广泛工作,以寻找有趣的数据片。线性投影用于定义参数空间的各个部分,并通过比较观测值的分布在部分内部和外部进行比较来计算兴趣。通过优化该指数,可以揭示诸如孔(低密度)或晶粒(高密度)之类的特征。优化被整合到导游中,以便搜索结构是动态的。当数据分布脱离统一或正常状态时,该方法对于问题可能很有用,例如在视觉上探索非线性歧管以及在多元空间中的功能。显示了截面追踪的两个应用:探索分类模型的决策边界,并探索由多个参数模型的复杂不平等条件引起的子空间。新方法可在Tourr包中的R中找到。
Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package.