对自我监督学习的令人尴尬的简单后门攻击

论文标题

对自我监督学习的令人尴尬的简单后门攻击

An Embarrassingly Simple Backdoor Attack on Self-supervised Learning

论文作者

Li, Changjiang, Pang, Ren, Xi, Zhaohan, Du, Tianyu, Ji, Shouling, Yao, Yuan, Wang, Ting

论文摘要

作为机器学习的新范式，自我监督学习（SSL）能够在不依赖标签的情况下学习复杂数据的高质量表示。除了消除对标记数据的需求外，研究还发现，SSL改善了对监督学习的对抗性鲁棒性，因为缺乏标签使对手操纵模型预测的挑战更具挑战性。但是，这种鲁棒性优势在其他类型的攻击中概括的程度仍然是一个悬而未决的问题。我们在后门攻击的背景下探索这个问题。具体来说，我们设计和评估CTRL，这是一种令人尴尬但高效的自我监视后门攻击。通过仅用无法区分的中毒样品污染一小部分训练数据（<= 1％），CTRL会导致任何触发器的输入在推理时在较大的可能性（> = 99％）中误分为对手指定的类别。我们的发现表明，SSL和监督学习非常容易受到后门攻击的影响。更重要的是，通过CTRL的镜头，我们研究了SSL对后门攻击的固有脆弱性。有了经验和分析证据，我们揭示了SSL的表示不变性属性受益于对抗性鲁棒性，也可能是使\ ssl非常容易受到后门攻击的原因。我们的发现还表明，针对监督后门攻击的现有防御措施不容易改造为SSL的独特漏洞。

As a new paradigm in machine learning, self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels. In addition to eliminating the need for labeled data, research has found that SSL improves the adversarial robustness over supervised learning since lacking labels makes it more challenging for adversaries to manipulate model predictions. However, the extent to which this robustness superiority generalizes to other types of attacks remains an open question. We explore this question in the context of backdoor attacks. Specifically, we design and evaluate CTRL, an embarrassingly simple yet highly effective self-supervised backdoor attack. By only polluting a tiny fraction of training data (<= 1%) with indistinguishable poisoning samples, CTRL causes any trigger-embedded input to be misclassified to the adversary's designated class with a high probability (>= 99%) at inference time. Our findings suggest that SSL and supervised learning are comparably vulnerable to backdoor attacks. More importantly, through the lens of CTRL, we study the inherent vulnerability of SSL to backdoor attacks. With both empirical and analytical evidence, we reveal that the representation invariance property of SSL, which benefits adversarial robustness, may also be the very reason making \ssl highly susceptible to backdoor attacks. Our findings also imply that the existing defenses against supervised backdoor attacks are not easily retrofitted to the unique vulnerability of SSL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题