论文标题
概括跨文档事件的核心分辨率跨多个语料库
Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora
论文作者
论文摘要
跨文档事件核心分辨率(CDCR)是一项NLP任务,其中需要在整个文档集合中识别和聚集事件。 CDCR的目的是使下游多文档应用程序受益,但是尽管CORPORA和系统开发方面最近取得了进展,但尚未显示应用CDCR的下游改进。我们观察到,迄今为止,每个CDCR系统都已开发,训练和仅在一个各自的语料库上进行测试。这引起了人们对它们的普遍性的强烈关注 - 对于下游应用的必备性,在该应用中,域或事件的大小可能超过了策展语料库中的应用。为了调查这一假设,我们定义了涉及三个CDCR语料库的统一评估设置:ECB+,枪支暴力语料库和足球核心语料库(我们在代币级别上进行了重新介绍以使我们的分析成为可能)。我们将独立的,基于功能的系统与针对欧洲央行+开发的最新神经系统进行了比较。尽管绝对数字较低,但基于功能的系统在所有语料库中表现出更一致的性能,而神经系统则被击中。通过模型内省,我们发现事件动作,事件时间等的重要性对于解决实践中的核心方面的重要性在corpora之间差异很大。其他分析表明,几个系统过于拟合欧洲央行+语料库的结构。我们最终提出了有关如何实现通常适用的CDCR系统的建议 - 最重要的是,对多个CDCR Corpora的评估是强烈必要的。为了促进未来的研究,我们向公众发布了数据集,注释指南和系统实施。
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and system development, downstream improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability -- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To investigate this assumption, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our analysis possible). We compare a corpus-independent, feature-based system against a recent neural system developed for ECB+. Whilst being inferior in absolute numbers, the feature-based system shows more consistent performance across all corpora whereas the neural system is hit-and-miss. Via model introspection, we find that the importance of event actions, event time, etc. for resolving coreference in practice varies greatly between the corpora. Additional analysis shows that several systems overfit on the structure of the ECB+ corpus. We conclude with recommendations on how to achieve generally applicable CDCR systems in the future -- the most important being that evaluation on multiple CDCR corpora is strongly necessary. To facilitate future research, we release our dataset, annotation guidelines, and system implementation to the public.