论文标题
加权隔离和随机切割的森林算法用于异常检测
Weighted Isolation and Random Cut Forest Algorithms for Anomaly Detection
论文作者
论文摘要
已经开发了随机切割森林(RCF)算法用于异常检测,尤其是在时间序列数据中。 RCF算法是隔离林(如果)算法的改进版本。与IF算法不同,RCF算法可以通过将输入插入到构造的树网络中来确定实时输入是否包含异常。已经开发了各种RCF算法,包括强大的RCF(RRCF),在这些算法中,切割过程是自适应地选择的概率。 RRCF算法比IF算法表现出更好的性能,因为根据数据的几何范围来确定尺寸的缩短,而IF算法随机选择尺寸削减尺寸。但是,鉴于随机选择了拆分值,因此在IF和RRCF中均未考虑整体数据结构。在本文中,我们提出了新的IF和RCF算法,分别称为加权IF(WIF)和加权RCF(WRCF)算法。它们的拆分值是通过考虑给定数据的密度来确定的。为了引入WIF和WRCF,我们首先提出了一种新的几何测量,即密度度量,这对于构建WIF和WRCF至关重要。我们提供了密度度量的各种数学特性,并伴随着通过数值示例来支持和验证我们的主张的定理。
Random cut forest (RCF) algorithms have been developed for anomaly detection, particularly in time series data. The RCF algorithm is an improved version of the isolation forest (IF) algorithm. Unlike the IF algorithm, the RCF algorithm can determine whether real-time input contains an anomaly by inserting the input into the constructed tree network. Various RCF algorithms, including Robust RCF (RRCF), have been developed, where the cutting procedure is adaptively chosen probabilistically. The RRCF algorithm demonstrates better performance than the IF algorithm, as dimension cuts are decided based on the geometric range of the data, whereas the IF algorithm randomly chooses dimension cuts. However, the overall data structure is not considered in both IF and RRCF, given that split values are chosen randomly. In this paper, we propose new IF and RCF algorithms, referred to as the weighted IF (WIF) and weighted RCF (WRCF) algorithms, respectively. Their split values are determined by considering the density of the given data. To introduce the WIF and WRCF, we first present a new geometric measure, a density measure, which is crucial for constructing the WIF and WRCF. We provide various mathematical properties of the density measure, accompanied by theorems that support and validate our claims through numerical examples.