论文标题
通过解耦训练过程的后门防御
Backdoor Defense via Decoupling the Training Process
论文作者
论文摘要
最近的研究表明,深层神经网络(DNN)容易受到后门攻击的影响,在该攻击者中,攻击者通过毒害一些训练样本来嵌入DNN模型中的隐藏后门。攻击模型通常在良性样本上行为,而在激活后门时,其预测将发生恶意改变。我们透露,有毒样品往往会聚集在受攻击的DNN模型的特征空间中,这主要是由于端到端监督训练范式。受到这一观察的启发,我们提出了一种新颖的后门防御,通过将原始的端到端训练过程分为三个阶段。具体而言,我们首先通过\ emph {自我监督学习}基于培训样本而没有标签。学识渊博的骨干将将带有相同基真实标签的样品映射到特征空间中的相似位置。然后,我们冻结了学习过的骨干的参数,并通过所有(标记)训练样本的标准培训来训练其余完全连接的层。最后,为了进一步缓解第二阶段中毒样品的副作用,我们删除了基于学到的模型确定的一些“低疗程”样品的标签,并进行整个模型的\ emph {emph {半渗透的微调}。在多个基准数据集和DNN模型上进行了广泛的实验证明,拟议的防御能够有效地减少后门威胁,同时保持高精度在预测良性样本方面。我们的代码可在\ url {https://github.com/sclbd/dbd}上找到。
Recent studies have revealed that deep neural networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors in the DNN model by poisoning a few training samples. The attacked model behaves normally on benign samples, whereas its prediction will be maliciously changed when the backdoor is activated. We reveal that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the end-to-end supervised training paradigm. Inspired by this observation, we propose a novel backdoor defense via decoupling the original end-to-end training process into three stages. Specifically, we first learn the backbone of a DNN model via \emph{self-supervised learning} based on training samples without their labels. The learned backbone will map samples with the same ground-truth label to similar locations in the feature space. Then, we freeze the parameters of the learned backbone and train the remaining fully connected layers via standard training with all (labeled) training samples. Lastly, to further alleviate side-effects of poisoned samples in the second stage, we remove labels of some `low-credible' samples determined based on the learned model and conduct a \emph{semi-supervised fine-tuning} of the whole model. Extensive experiments on multiple benchmark datasets and DNN models verify that the proposed defense is effective in reducing backdoor threats while preserving high accuracy in predicting benign samples. Our code is available at \url{https://github.com/SCLBD/DBD}.