Cyclegt：通过自行车训练无监督的图形和文本到盖的生成

论文标题

Cyclegt：通过自行车训练无监督的图形和文本到盖的生成

CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training

论文作者

Guo, Qipeng, Jin, Zhijing, Qiu, Xipeng, Zhang, Weinan, Wipf, David, Zhang, Zheng

论文摘要

知识图和自然语言处理的交集的两个重要任务是图形到文本（G2T）和文本到图形（T2G）转换。由于数据收集的难度和高成本，这两个字段中可用的监督数据通常是数万个数以万计的幅度，例如，在预处理后的WebNLG〜2017数据集中，有18K的数量，远比其他任务（例如机器翻译）的数百万数据少得多。因此，G2T和T2G的深度学习模型在很大程度上遭受了稀缺的培训数据。我们提出了一种无监督的训练方法，它可以从完全非并行的图和文本数据引导，并在两种形式之间转换。 WebNLG数据集上的实验表明，我们的无监督模型经过相同数量的数据训练，可以与几个完全监督的模型相同。在非平行genwiki数据集上进行的进一步实验验证了我们的方法在无监督的基准中表现最好。这将我们的框架验证为克服G2T和T2G领域中数据稀缺问题的有效方法。我们的代码可在https://github.com/qipengguo/cyclegt上找到。

Two important tasks at the intersection of knowledge graphs and natural language processing are graph-to-text (G2T) and text-to-graph (T2G) conversion. Due to the difficulty and high cost of data collection, the supervised data available in the two fields are usually on the magnitude of tens of thousands, for example, 18K in the WebNLG~2017 dataset after preprocessing, which is far fewer than the millions of data for other tasks such as machine translation. Consequently, deep learning models for G2T and T2G suffer largely from scarce training data. We present CycleGT, an unsupervised training method that can bootstrap from fully non-parallel graph and text data, and iteratively back translate between the two forms. Experiments on WebNLG datasets show that our unsupervised model trained on the same number of data achieves performance on par with several fully supervised models. Further experiments on the non-parallel GenWiki dataset verify that our method performs the best among unsupervised baselines. This validates our framework as an effective approach to overcome the data scarcity problem in the fields of G2T and T2G. Our code is available at https://github.com/QipengGuo/CycleGT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题