论文标题
排出异常检测
Out-Of-Bag Anomaly Detection
论文作者
论文摘要
数据异常在现实世界数据集中无处不在,并且可能会对机器学习(ML)系统(例如自动化房屋估值)产生不利影响。检测异常可能会使ML应用程序更负责任和值得信赖。但是,缺乏异常标签和现实数据集的复杂性质使异常检测成为一个具有挑战性的无监督学习问题。在本文中,我们提出了一种基于模型的新型异常检测方法,该方法称为Out-BAG异常检测,该检测处理由数值和分类特征组成的多维数据集。提出的方法将无监督的问题分解为一组集合模型的训练。杠杆估计值可用于得出有效的异常检测措施。我们不仅通过基准数据集的全面实验来证明我们方法的最先进性能,而且还表明我们的模型可以通过案例研究对房屋估值进行研究,以提高ML系统作为数据预处理步骤的准确性和可靠性。
Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems, such as automated home valuation. Detecting anomalies could make ML applications more responsible and trustworthy. However, the lack of labels for anomalies and the complex nature of real-world datasets make anomaly detection a challenging unsupervised learning problem. In this paper, we propose a novel model-based anomaly detection method, that we call Out-of- Bag anomaly detection, which handles multi-dimensional datasets consisting of numerical and categorical features. The proposed method decomposes the unsupervised problem into the training of a set of ensemble models. Out-of-Bag estimates are leveraged to derive an effective measure for anomaly detection. We not only demonstrate the state-of-the-art performance of our method through comprehensive experiments on benchmark datasets, but also show our model can improve the accuracy and reliability of an ML system as data pre-processing step via a case study on home valuation.