使用预训练的语言模型的不确定性量化：大规模的经验分析

论文标题

使用预训练的语言模型的不确定性量化：大规模的经验分析

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

论文作者

Xiao, Yuxin, Liang, Paul Pu, Bhatt, Umang, Neiswanger, Willie, Salakhutdinov, Ruslan, Morency, Louis-Philippe

论文摘要

预先训练的语言模型（PLM）由于其在多种自然语言处理（NLP）任务中引人注目的预测性能而越来越受欢迎。在为NLP任务制定基于PLM的预测管道时，对于管道最小化校准误差也至关重要，尤其是在安全至关重要的应用中。也就是说，管道应可靠地表明我们何时可以相信其预测。特别是，管道背后有各种考虑：（1）选择和（2）PLM的大小，（3）不确定性量化词的选择，（4）选择微调损失等等。尽管先前的工作已经研究了其中一些考虑因素，但它们通常会根据有限的经验研究范围得出结论。关于如何组成基于PLM的预测管道，仍然缺乏整体分析。为了填补这一空白，我们根据三个普遍的NLP分类任务和域移动设置来比较每个考虑的广泛选择。作为响应，我们建议以下内容：（1）使用Electra进行PLM编码，（2）在可能的情况下使用较大的PLM，（3）使用温度缩放作为不确定性量词，以及（4）使用焦点损失进行微调。

Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when we can trust its predictions. In particular, there are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. Although prior work has looked into some of these considerations, they usually draw conclusions based on a limited scope of empirical studies. There still lacks a holistic analysis on how to compose a well-calibrated PLM-based prediction pipeline. To fill this void, we compare a wide range of popular options for each consideration based on three prevalent NLP classification tasks and the setting of domain shift. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题