ESTAS：有效而稳定的特洛伊木马攻击在自我监督的编码器中，一个目标未标记样本

论文标题

ESTAS：有效而稳定的特洛伊木马攻击在自我监督的编码器中，一个目标未标记样本

ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders with One Target Unlabelled Sample

论文作者

Xue, Jiaqi, Lou, Qian

论文摘要

新兴的自我监督学习（SSL）已成为一种流行的图像表示方法，以消除对标记数据的依赖，并通过大规模无标记的数据学习丰富的表示。然后，一个人可以在预先训练的SSL图像编码器上训练下游分类器，而下游数据很少或没有标记的下游数据。尽管广泛的作品表明，SSL在不同的下游任务上取得了非凡的竞争性能，但其安全问题，例如SSL编码中的特洛伊木马攻击，仍然没有得到充分研究。在这项工作中，我们提出了一种新型的特洛伊木马攻击方法，该方法用ESTA表示，可以在SSL编码器中具有一个只有一个目标未标记样本的SSL编码器进行有效稳定的攻击。特别是，我们提出了ESTA中的一致触发中毒和级联优化，以提高攻击功效和建模准确性，并消除昂贵的目标级数据样本样本从大型无标记的数据中提取。我们在多个数据集上进行的大量实验表明，ESTA稳定地达到了一个目标级样本的攻击成功率（ASR）。与先前的工作相比，ESTA的平均速度> 30％的ASR增加，准确度提高了8.3％。

Emerging self-supervised learning (SSL) has become a popular image representation encoding method to obviate the reliance on labeled data and learn rich representations from large-scale, ubiquitous unlabelled data. Then one can train a downstream classifier on top of the pre-trained SSL image encoder with few or no labeled downstream data. Although extensive works show that SSL has achieved remarkable and competitive performance on different downstream tasks, its security concerns, e.g, Trojan attacks in SSL encoders, are still not well-studied. In this work, we present a novel Trojan Attack method, denoted by ESTAS, that can enable an effective and stable attack in SSL encoders with only one target unlabeled sample. In particular, we propose consistent trigger poisoning and cascade optimization in ESTAS to improve attack efficacy and model accuracy, and eliminate the expensive target-class data sample extraction from large-scale disordered unlabelled data. Our substantial experiments on multiple datasets show that ESTAS stably achieves > 99% attacks success rate (ASR) with one target-class sample. Compared to prior works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题