将经过认证的回归减少为一般中毒攻击的认证分类

论文标题

将经过认证的回归减少为一般中毒攻击的认证分类

Reducing Certified Regression to Certified Classification for General Poisoning Attacks

论文作者

Hammoudeh, Zayd, Lowd, Daniel

论文摘要

对抗训练实例可能会严重扭曲模型的行为。这项工作调查了经过认证的回归防御措施，该防御措施提供了有关在中毒攻击下回归者的预测可能发生多大变化的保证限制。我们的关键见解是，使用中位数作为模型的主要决策功能时，经认证的回归减少了基于投票的认证分类。结合我们的减少与现有认证的分类器的相结合，我们建议六个新的回归剂，证明是中毒攻击的。就我们的知识而言，这是第一部证明单个回归预测的鲁棒性的工作，而没有任何关于数据分布和模型体系结构的假设。我们还表明，现有的最新认证分类器的假设通常过于悲观。我们引入了对模型鲁棒性的更严格的分析，在许多情况下，这会大大改善认证的保证。最后，我们从经验上证明了我们的方法对回归和分类数据的有效性，在1％的培训设置腐败和4％的预测中，可以保证多达50％的测试预测准确性。我们的源代码可从https://github.com/zaydh/certified-regression获得。

Adversarial training instances can severely distort a model's behavior. This work investigates certified regression defenses, which provide guaranteed limits on how much a regressor's prediction may change under a poisoning attack. Our key insight is that certified regression reduces to voting-based certified classification when using median as a model's primary decision function. Coupling our reduction with existing certified classifiers, we propose six new regressors provably-robust to poisoning attacks. To the extent of our knowledge, this is the first work that certifies the robustness of individual regression predictions without any assumptions about the data distribution and model architecture. We also show that the assumptions made by existing state-of-the-art certified classifiers are often overly pessimistic. We introduce a tighter analysis of model robustness, which in many cases results in significantly improved certified guarantees. Lastly, we empirically demonstrate our approaches' effectiveness on both regression and classification data, where the accuracy of up to 50% of test predictions can be guaranteed under 1% training set corruption and up to 30% of predictions under 4% corruption. Our source code is available at https://github.com/ZaydH/certified-regression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题