论文标题
在硬标签设置中窃取无数据模型
Towards Data-Free Model Stealing in a Hard Label Setting
论文作者
论文摘要
部署为服务的机器学习模型(MLAA)容易受到模型窃取攻击的影响,在该攻击中,对手试图在受限的访问框架内窃取该模型。尽管现有的攻击显示了使用分类网络的SoftMax预测的接近完美的克隆模型性能,但大多数API仅允许仅访问TOP-1标签。在这项工作中,我们表明,实际上也可以通过仅访问TOP-1预测(硬标签设置)来窃取机器学习模型,而无需访问模型梯度(Black-Box设置),甚至可以在低查询预算中访问培训数据集(无数据设置)。我们提出了一个基于GAN的新型框架,该框架通过利用克隆网络的梯度作为受害者梯度的代理来克服硬标签设置的挑战,培训学生和发电机,以有效地窃取模型。我们建议通过将公开可用(潜在无关)数据集作为薄弱图像之前,克服与典型的无数据设置相关的大量查询成本。我们还表明,即使没有此类数据,也可以使用合成制作的样本在低查询预算内实现最先进的结果。我们也是第一个在100类数据集上的限制访问设置中窃取模型窃取的可扩展性的人。
Machine learning models deployed as a service (MLaaS) are susceptible to model stealing attacks, where an adversary attempts to steal the model within a restricted access framework. While existing attacks demonstrate near-perfect clone-model performance using softmax predictions of the classification network, most of the APIs allow access to only the top-1 labels. In this work, we show that it is indeed possible to steal Machine Learning models by accessing only top-1 predictions (Hard Label setting) as well, without access to model gradients (Black-Box setting) or even the training dataset (Data-Free setting) within a low query budget. We propose a novel GAN-based framework that trains the student and generator in tandem to steal the model effectively while overcoming the challenge of the hard label setting by utilizing gradients of the clone network as a proxy to the victim's gradients. We propose to overcome the large query costs associated with a typical Data-Free setting by utilizing publicly available (potentially unrelated) datasets as a weak image prior. We additionally show that even in the absence of such data, it is possible to achieve state-of-the-art results within a low query budget using synthetically crafted samples. We are the first to demonstrate the scalability of Model Stealing in a restricted access setting on a 100 class dataset as well.