论文标题
通过探测预测微调性能
Predicting Fine-Tuning Performance with Probing
论文作者
论文摘要
大型NLP模型最近在语言理解任务中表现出了令人印象深刻的表现,通常通过其微调的性能进行评估。另外,探测已被越来越多的注意力作为一种轻巧的方法来解释大型NLP模型的内在机制。在探测中,事后分类器接受了诊断特定能力的“室外”数据集的培训。在探索语言模型导致有见地的发现时,它们似乎与模型的开发脱节。本文探讨了探测深NLP模型以提取广泛用于模型开发的代理信号的实用性 - 微调性能。我们发现,只有三个探测测试的精度可以通过错误$ 40 \%$ - $ 80 \%$ $比基线的精确度预测微调性能。我们进一步讨论了可能的途径,探测可以增强深入NLP模型的发展。
Large NLP models have recently shown impressive performance in language understanding tasks, typically evaluated by their fine-tuned performance. Alternatively, probing has received increasing attention as being a lightweight method for interpreting the intrinsic mechanisms of large NLP models. In probing, post-hoc classifiers are trained on "out-of-domain" datasets that diagnose specific abilities. While probing the language models has led to insightful findings, they appear disjointed from the development of models. This paper explores the utility of probing deep NLP models to extract a proxy signal widely used in model development -- the fine-tuning performance. We find that it is possible to use the accuracies of only three probing tests to predict the fine-tuning performance with errors $40\%$ - $80\%$ smaller than baselines. We further discuss possible avenues where probing can empower the development of deep NLP models.