用众包数据集对文本中的先决条件进行建模

论文标题

用众包数据集对文本中的先决条件进行建模

Modeling Preconditions in Text with a Crowd-sourced Dataset

论文作者

Kwon, Heeyoung, Koupaee, Mahnaz, Singh, Pratyush, Sawhney, Gargi, Shukla, Anmol, Kallur, Keerthi Kumar, Chambers, Nathanael, Balasubramanian, Niranjan

论文摘要

前提提供了事件之间的一种逻辑联系形式，这些形式解释了为什么某些事件一起发生和与更广泛研究的关系（例如因果关系，时间订购，综合和话语关系）互补的信息。文本中的先决条件的建模受到了部分阻碍，部分原因是缺乏基于文本的大规模标记的数据。本文介绍了Peko，这是Newswire事件对之间的前提条件的人群注释，这是比先前的文本注释大的数量级。为了补充这个新的语料库，我们还介绍了旨在建模前提条件的两个挑战任务：（i）前提识别 - 根据事件提及对定义的标准分类任务，以及（ii）前提条件生成 - 旨在测试更普遍的能力来推理给定事件的生成任务。对这两个任务的评估都表明，即使对于当今的大型语言模型（LM），建模前提也是挑战性的。这表明，仅在LM衍生的表示中就不容易获得前提知识。我们这一代的结果表明，与对原始文本或暂时订购的Corpora进行培训相比，对Peko的LM进行微调会产生更好的条件关系。

Preconditions provide a form of logical connection between events that explains why some events occur together and information that is complementary to the more widely studied relations such as causation, temporal ordering, entailment, and discourse relations. Modeling preconditions in text has been hampered in part due to the lack of large scale labeled data grounded in text. This paper introduces PeKo, a crowd-sourced annotation of preconditions between event pairs in newswire, an order of magnitude larger than prior text annotations. To complement this new corpus, we also introduce two challenge tasks aimed at modeling preconditions: (i) Precondition Identification -- a standard classification task defined over pairs of event mentions, and (ii) Precondition Generation -- a generative task aimed at testing a more general ability to reason about a given event. Evaluation on both tasks shows that modeling preconditions is challenging even for today's large language models (LM). This suggests that precondition knowledge is not easily accessible in LM-derived representations alone. Our generation results show that fine-tuning an LM on PeKo yields better conditional relations than when trained on raw text or temporally-ordered corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题