论文标题
共同开放知识基础规范化和链接
Joint Open Knowledge Base Canonicalization and Linking
论文作者
论文摘要
开放信息提取(OIE)方法从文本中提取大量的OIE三元组(名词短语,关系短语,名词短语),该文本构成了大型开放知识库(OKB)。但是,OKB中的名词短语(NP)和关系短语(RPS)不是规范化的,并且经常出现在不同的释义文本变体中,这导致了冗余和模棱两可的事实。为了解决这个问题,有两个相关的任务:OKB规范化(即将NP和RPS转换为规范化形式)和OKB链接(即,将NP和RPS与其相应的实体和关系链接到一个精选的知识基础中(例如DBPEDIA)(DBPEDIA)(DBPEDIA)。这两个任务都可以紧密地构成这些任务。探索第一次探索OKB的任务,并首次链接,并根据因子图模型提出了一个新颖的框架JOCL,以使它们相互加强。链接)在平均F1(准确性)方面。
Open Information Extraction (OIE) methods extract a large number of OIE triples (noun phrase, relation phrase, noun phrase) from text, which compose large Open Knowledge Bases (OKBs). However, noun phrases (NPs) and relation phrases (RPs) in OKBs are not canonicalized and often appear in different paraphrased textual variants, which leads to redundant and ambiguous facts. To address this problem, there are two related tasks: OKB canonicalization (i.e., convert NPs and RPs to canonicalized form) and OKB linking (i.e., link NPs and RPs with their corresponding entities and relations in a curated Knowledge Base (e.g., DBPedia). These two tasks are tightly coupled, and one task can benefit significantly from the other. However, they have been studied in isolation so far. In this paper, we explore the task of joint OKB canonicalization and linking for the first time, and propose a novel framework JOCL based on factor graph model to make them reinforce each other. JOCL is flexible enough to combine different signals from both tasks, and able to extend to fit any new signals. A thorough experimental study over two large scale OIE triple data sets shows that our framework outperforms all the baseline methods for the task of OKB canonicalization (OKB linking) in terms of average F1 (accuracy).