论文标题
CORWA:面向引文的相关工作注释数据集
CORWA: A Citation-Oriented Related Work Annotation Dataset
论文作者
论文摘要
学术研究是一项探索性活动,可以发现解决问题的新解决方案。通过这种性质,学术研究工作进行了文学评论,以将其新颖性与先前的工作区分开。在自然语言处理中,该文献综述通常在“相关工作”部分进行。鉴于其余的研究论文以及要引用的论文列表,相关工作的任务旨在自动生成相关的工作部分。此任务的先前工作集中在句子上是作为一代基本单位,忽略了相关工作部分由可变长度文本片段组成的事实,这些文本片段来自不同的信息源。作为迈向语言动机相关的工作生成框架的第一步,我们提出了面向引文的相关工作注释(CORWA)数据集,该数据集标记了来自不同信息源的不同类型的引用文本片段。我们训练一个强大的基线模型,该模型会自动在大型未标记的相关工作部分文本上标记Corwa标签。我们进一步提出了一个新的框架,用于人类迭代,迭代,抽象性相关的工作产生。
Academic research is an exploratory activity to discover new solutions to problems. By this nature, academic research works perform literature reviews to distinguish their novelties from prior work. In natural language processing, this literature review is usually conducted under the "Related Work" section. The task of related work generation aims to automatically generate the related work section given the rest of the research paper and a list of papers to cite. Prior work on this task has focused on the sentence as the basic unit of generation, neglecting the fact that related work sections consist of variable length text fragments derived from different information sources. As a first step toward a linguistically-motivated related work generation framework, we present a Citation Oriented Related Work Annotation (CORWA) dataset that labels different types of citation text fragments from different information sources. We train a strong baseline model that automatically tags the CORWA labels on massive unlabeled related work section texts. We further suggest a novel framework for human-in-the-loop, iterative, abstractive related work generation.