扩展和移动您的功能：有效模型调整的新基线

论文标题

扩展和移动您的功能：有效模型调整的新基线

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

论文作者

Lian, Dongze, Zhou, Daquan, Feng, Jiashi, Wang, Xinchao

论文摘要

现有的微调方法要么调整预训练模型的所有参数（完整的微调），该参数效率不高，要么仅调节最后一个线性层（线性探测），与完整的微调相比，它的精度下降了明显的准确性下降。在本文中，我们提出了一种称为SSF的新参数有效的微调方法，表示研究人员只需要扩展和移动由预训练的模型提取的深度特征即可赶上完整微调的性能。这样，即使使用较小数量的可调参数，SSF也出奇地优于其他参数有效的微调方法。此外，与在培训和推理阶段中引入额外参数和计算成本引入额外参数和计算成本的某些现有参数有效的微调方法不同，SSF仅在培训阶段添加可学习的参数，并且这些其他参数可以通过重新启用的原始预培训模型阶段合并到原始的预培养的模型权重中。借助拟议的SSF，我们的模型将获得2.46％（90.72％对88.54％）和11.48％（73.10％vs. 65.57％）的FGVC和VTAB-1K的性能提高，而与完整的微调相比，TOP-1的精度仅为TOP-1的准确性，但只有大约0.3m的参数。我们还在各种模型家族（CNN，变压器和MLP）和数据集中进行了大量实验。 26个图像分类数据集的结果和3个鲁棒性和分布数据集的结果显示了SSF的有效性。代码可在https://github.com/dongzelian/ssf上找到。

Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers a significant accuracy drop compared to the full fine-tuning. In this paper, we propose a new parameter-efficient fine-tuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance of full fine-tuning. In this way, SSF also surprisingly outperforms other parameter-efficient fine-tuning approaches even with a smaller number of tunable parameters. Furthermore, different from some existing parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce the extra parameters and computational cost in the training and inference stages, SSF only adds learnable parameters during the training stage, and these additional parameters can be merged into the original pre-trained model weights via re-parameterization in the inference phase. With the proposed SSF, our model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0.3M parameters. We also conduct amounts of experiments in various model families (CNNs, Transformers, and MLPs) and datasets. Results on 26 image classification datasets in total and 3 robustness & out-of-distribution datasets show the effectiveness of SSF. Code is available at https://github.com/dongzelian/SSF.

下载PDF全文

下载文献需遵守相关版权规定

论文标题