在使用分类数据集评估图形级别离群值检测时：特殊的观察和新见解

论文标题

在使用分类数据集评估图形级别离群值检测时：特殊的观察和新见解

On Using Classification Datasets to Evaluate Graph-Level Outlier Detection: Peculiar Observations and New Insights

论文作者

Zhao, Lingxiao, Akoglu, Leman

论文摘要

在评估各种检测模型的情况下，更常见的采矿社区的常见做法是重新使用分类数据集。为此，通常使用二进制分类数据集，其中其中一个类的样本被指定为inlier样用样本，而另一类的样本基本上是下采样以创建地面真相外离群体样本。很少研究图形级别离群值检测（GLOD），但具有许多潜在影响的现实应用。在这项研究中，我们确定了一个引人入胜的问题，该问题是重新利用GLOD的图形分类数据集。我们发现，这些模型的ROC-AUC性能会显着变化（从高到非常低，甚至比随机更糟），具体取决于哪个类的采样。有趣的是，这两个变体上的roc-aucs大约是总和至1的，它们的性能差距会随着某些基于传播的离群检测模型的传播增加而扩大。我们仔细研究了由基于传播的模型产生的图形嵌入空间，并找到了两个驱动因素：（1）阶级密度之间的差异，这是通过传播扩大的，以及（2）跨类重叠的支持（混合嵌入）。我们还研究了其他图形嵌入方法和下游异常值检测器，并发现仍然存在着令人着迷的性能翻转问题，但是哪种下样本的版本可实现更高的性能。周到的分析对全面的结果进行了深入的分析，进一步更深入地了解了我们对既定问题的理解。

It is common practice of the outlier mining community to repurpose classification datasets toward evaluating various detection models. To that end, often a binary classification dataset is used, where samples from one of the classes is designated as the inlier samples, and the other class is substantially down-sampled to create the ground-truth outlier samples. Graph-level outlier detection (GLOD) is rarely studied but has many potentially influential real-world applications. In this study, we identify an intriguing issue with repurposing graph classification datasets for GLOD. We find that ROC-AUC performance of the models changes significantly (flips from high to very low, even worse than random) depending on which class is down-sampled. Interestingly, ROC-AUCs on these two variants approximately sum to 1 and their performance gap is amplified with increasing propagations for a certain family of propagation based outlier detection models. We carefully study the graph embedding space produced by propagation based models and find two driving factors: (1) disparity between within-class densities which is amplified by propagation, and (2)overlapping support (mixing of embeddings) across classes. We also study other graph embedding methods and downstream outlier detectors, and find that the intriguing performance flip issue still widely exists but which version of the downsample achieves higher performance may vary. Thoughtful analysis over comprehensive results further deeper our understanding of the established issue.

下载PDF全文

下载文献需遵守相关版权规定

论文标题