论文标题

在翻译中理解,变形金刚用于领域理解

Understood in Translation, Transformers for Domain Understanding

论文作者

Christofidellis, Dimitrios, Manica, Matteo, Georgopoulos, Leonidas, Vandierendonck, Hans

论文摘要

知识获取是任何知识图(KG)应用程序的重要第一步。这些知识可以从给定的语料库(kg生成过程)中提取,也可以从现有kg(kg规范过程)中提取。知识获取专注于特定的解决方案,是一项通常由主题专家精心策划和监督的劳动密集型任务。具体而言,关注的域通常是手动定义的,然后使用所需的一代或提取工具来生产kg。本文中,我们根据变压器提出了一种监督机器学习方法,以实现语料库的域定义。我们争论为什么在构建时间和生成图的质量方面,这种自动化的定义对域结构都是有益的。通过将其与基于CNN和RNNS模型的两种参考方法进行比较,在三个公共数据集(WebNLG,NYT和DOCRED)上进行了广泛验证。评估显示了我们模型在此任务中的效率。为了关注科学文档的理解,我们根据PubMed提取的出版物提出了一个新的健康领域数据集,并成功地利用了我们的方法。最后,我们演示了这项工作是如何为全自动和无监督的KG代奠定了基础。

Knowledge acquisition is the essential first step of any Knowledge Graph (KG) application. This knowledge can be extracted from a given corpus (KG generation process) or specified from an existing KG (KG specification process). Focusing on domain specific solutions, knowledge acquisition is a labor intensive task usually orchestrated and supervised by subject matter experts. Specifically, the domain of interest is usually manually defined and then the needed generation or extraction tools are utilized to produce the KG. Herein, we propose a supervised machine learning method, based on Transformers, for domain definition of a corpus. We argue why such automated definition of the domain's structure is beneficial both in terms of construction time and quality of the generated graph. The proposed method is extensively validated on three public datasets (WebNLG, NYT and DocRED) by comparing it with two reference methods based on CNNs and RNNs models. The evaluation shows the efficiency of our model in this task. Focusing on scientific document understanding, we present a new health domain dataset based on publications extracted from PubMed and we successfully utilize our method on this. Lastly, we demonstrate how this work lays the foundation for fully automated and unsupervised KG generation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源