论文标题
一个新的有效措施,用于挖掘方面
A new effective and efficient measure for outlying aspect mining
论文作者
论文摘要
外围方面挖掘(OAM)旨在查找给定查询对给定数据集的离群值的子空间(又称为方面)。现有的OAM算法使用传统的基于距离/密度的离群分数来对子空间进行排名。由于这些基于距离/密度的分数取决于子空间的维度,因此不能直接在不同维度的子空间之间进行比较。 $ z $ -SCORE归一化已用于使其可比性。它需要计算每个子空间中所有实例的离群分数。这增加了本来已经昂贵的密度估计之上的大量计算开销---使OAM算法不可避免地在大和/或高维数据集中运行。我们还发现,在某些情况下,$ z $ - 分数的归一化不合适。在本文中,我们引入了一个名为Sinne的新乐谱,该分数与子空间的维度无关。这使得可以直接比较具有不同维度的子空间中的得分而无需任何其他归一化。我们的实验结果表明,Sinne产生的结果更好或至少与现有分数相同。它显着改善了基于光束搜索的现有OAM算法的运行时间。
Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given dataset. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. $Z$-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation---making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that $Z$-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called SiNNE, which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search.