论文标题

使用预训练的语言模型的不确定性量化:大规模的经验分析

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

论文作者

Xiao, Yuxin, Liang, Paul Pu, Bhatt, Umang, Neiswanger, Willie, Salakhutdinov, Ruslan, Morency, Louis-Philippe

论文摘要

预先训练的语言模型(PLM)由于其在多种自然语言处理(NLP)任务中引人注目的预测性能而越来越受欢迎。在为NLP任务制定基于PLM的预测管道时,对于管道最小化校准误差也至关重要,尤其是在安全至关重要的应用中。也就是说,管道应可靠地表明我们何时可以相信其预测。特别是,管道背后有各种考虑:(1)选择和(2)PLM的大小,(3)不确定性量化词的选择,(4)选择微调损失等等。尽管先前的工作已经研究了其中一些考虑因素,但它们通常会根据有限的经验研究范围得出结论。关于如何组成基于PLM的预测管道,仍然缺乏整体分析。为了填补这一空白,我们根据三个普遍的NLP分类任务和域移动设置来比较每个考虑的广泛选择。作为响应,我们建议以下内容:(1)使用Electra进行PLM编码,(2)在可能的情况下使用较大的PLM,(3)使用温度缩放作为不确定性量词,以及(4)使用焦点损失进行微调。

Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when we can trust its predictions. In particular, there are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. Although prior work has looked into some of these considerations, they usually draw conclusions based on a limited scope of empirical studies. There still lacks a holistic analysis on how to compose a well-calibrated PLM-based prediction pipeline. To fill this void, we compare a wide range of popular options for each consideration based on three prevalent NLP classification tasks and the setting of domain shift. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源