关于随机森林针对不靶向数据中毒的鲁棒性：一种基于合奏的方法

论文标题

关于随机森林针对不靶向数据中毒的鲁棒性：一种基于合奏的方法

On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach

论文作者

Anisetti, Marco, Ardagna, Claudio A., Balestrucci, Alessandro, Bena, Nicola, Damiani, Ernesto, Yeun, Chan Yeob

论文摘要

机器学习变得无处不在。从金融到医学，机器学习模型正在提高决策过程，甚至在某些任务中表现优于人类。但是，在预测质量方面的巨大进展并未在此类模型的安全性和相应的预测中找到对应物，在这种模型的安全性和相应的预测中，训练集（中毒）的分数扰动会严重破坏模型的准确性。关于中毒攻击和防御措施的研究在过去十年中受到了越来越多的关注，导致了一些有前途的解决方案，旨在提高机器学习的稳健性。其中，基于合奏的防御能力，在训练集的部分及其预测中进行了培训，然后进行了汇总，以线性开销的价格提供了强大的理论保证。令人惊讶的是，没有对基本模型构成任何限制的集合防御，并未应用于增加随机森林模型的鲁棒性。本文的工作旨在通过设计和实施一种新型基于哈希的合奏方法来填补这一空白，从而保护随机森林免受未靶向的随机中毒攻击。广泛的实验评估衡量了我们的方法对各种攻击的表现以及其在资源消耗和绩效方面的可持续性，并将其与基于随机森林的传统单片模型进行了比较。最后的讨论介绍了我们的主要发现，并将我们的方法与针对随机森林的现有中毒防御措施进行了比较。

Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest models. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.

下载PDF全文

下载文献需遵守相关版权规定

论文标题