论文标题
通过错误校正代码将攻击嵌入到深神经网络中的强大和可验证的信息将攻击嵌入
Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting Codes
论文作者
论文摘要
在深度学习时代,用户经常利用第三方机器学习工具来训练深神经网络(DNN)分类器,然后将分类器部署为最终用户软件产品或云服务。在信息嵌入攻击中,攻击者是恶意第三方机器学习工具的提供者。攻击者在培训期间将消息嵌入到DNN分类器中,并通过在用户部署后通过查询黑盒分类器的API来恢复消息。嵌入攻击的信息引起了人们的关注,因为有各种应用,例如水印DNN分类器和损害用户隐私。嵌入攻击的最新信息具有两个关键局限性:1)他们无法验证恢复消息的正确性,2)它们不适合分类器的后处理。 在这项工作中,我们旨在设计信息嵌入攻击的信息,这些攻击对流行的后处理方法具有可验证和强大的攻击。具体来说,我们利用环状冗余检查来验证恢复消息的正确性。此外,为了与后处理保持强大,我们利用涡轮代码(一种错误校正的代码)在将消息嵌入到DNN分类器之前进行编码。我们建议通过自适应查询分类器来保存查询来恢复消息。我们的自适应恢复策略利用了涡轮代码的属性,该属性支持使用部分代码纠正错误。我们使用模拟消息评估嵌入攻击的信息,并将其应用于三个具有语义解释的应用程序。我们考虑了8种流行方法来后处理分类器。我们的结果表明,在所有考虑的情况下,我们的攻击都可以准确,可验证的是恢复消息,而在许多情况下,最新的攻击无法准确恢复消息。
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier and then deploys the classifier as an end-user software product or a cloud service. In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool. The attacker embeds a message into the DNN classifier during training and recovers the message via querying the API of the black-box classifier after the user deploys it. Information embedding attacks have attracted growing attention because of various applications such as watermarking DNN classifiers and compromising user privacy. State-of-the-art information embedding attacks have two key limitations: 1) they cannot verify the correctness of the recovered message, and 2) they are not robust against post-processing of the classifier. In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods. Specifically, we leverage Cyclic Redundancy Check to verify the correctness of the recovered message. Moreover, to be robust against post-processing, we leverage Turbo codes, a type of error-correcting codes, to encode the message before embedding it to the DNN classifier. We propose to recover the message via adaptively querying the classifier to save queries. Our adaptive recovery strategy leverages the property of Turbo codes that supports error correcting with a partial code. We evaluate our information embedding attacks using simulated messages and apply them to three applications, where messages have semantic interpretations. We consider 8 popular methods to post-process the classifier. Our results show that our attacks can accurately and verifiably recover the messages in all considered scenarios, while state-of-the-art attacks cannot accurately recover the messages in many scenarios.