基于段落的变压器预训练多句推理

论文标题

基于段落的变压器预训练多句推理

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

论文作者

Di Liello, Luca, Garg, Siddhant, Soldaini, Luca, Moschitti, Alessandro

论文摘要

推理任务（例如答案句子选择（AS2）或事实验证）通常通过将基于变压器的模型作为单个句子对分类器来解决。最近的研究表明，这些任务从共同的多个候选句子的依赖性中受益。在本文中，我们首先表明，当用于多转化推理任务进行微调时，受欢迎的预训练的变压器的性能很差。然后，我们提出了一个新的预训练目标，该目标对段落级的语义进行了对多个输入句子进行建模。我们对三个AS2和一个事实验证数据集的评估证明了我们的预训练技术优于传统技术，用于变形金刚用作多键入推理任务的关节模型，以及用作这些任务的句子对句子配方的交叉编码器时的优越性。我们的代码和预训练模型将在https://github.com/amazon-research/wqa-multi-sentence-inderence上发布。

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks. Our code and pre-trained models are released at https://github.com/amazon-research/wqa-multi-sentence-inference .

下载PDF全文

下载文献需遵守相关版权规定

论文标题