警报：将语言模型适应推理任务

论文标题

警报：将语言模型适应推理任务

ALERT: Adapting Language Models to Reasoning Tasks

论文作者

Yu, Ping, Wang, Tianlu, Golovneva, Olga, AlKhamissi, Badr, Verma, Siddharth, Jin, Zhijing, Ghosh, Gargi, Diab, Mona, Celikyilmaz, Asli

论文摘要

当前的大型语言模型可以在需要逐步推理的复杂任务上进行合理的表现。这些模型是否运用了他们在培训背景以外的培训和理由期间学习的推理技能，还是只是在更精细的粒度上记住训练语料库，并学会了更好地理解自己的背景？为了嘲笑这些可能性，我们引入了警报，一个基准和一套分析，用于评估语言模型的推理能力，以比较需要解决推理技能的复杂任务上的预训练和填充模型。 Alert提供了一个测试床，以鉴定有关细粒度推理技能的任何语言模型，该模型涵盖了20个数据集，并涵盖了10种不同的推理技能。我们利用警报来进一步调查Finetuning的作用。通过广泛的经验分析，我们发现语言模型学习了更多的推理技能，例如与训练阶段的状态相比，在鉴定阶段的文本范围，绑架推理和类比推理。我们还发现，当对语言模型进行训练时，它们倾向于过度贴上及时的模板，这会损害导致概括问题的模型的稳健性。

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart these possibilities, we introduce ALERT, a benchmark and suite of analyses for assessing language models' reasoning ability comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve. ALERT provides a test bed to asses any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. We leverage ALERT to further investigate the role of finetuning. With extensive empirical analysis we find that language models learn more reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during finetuning stage compared to pretraining state. We also find that when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models causing generalization problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题