论文标题
零照片数据之间的神经管道生成
Neural Pipeline for Zero-Shot Data-to-Text Generation
论文作者
论文摘要
在数据到文本(D2T)生成中,对内域数据的培训会导致对数据表示并重复训练数据噪声过度拟合。我们研究了如何避免在D2T生成数据集中填充预元的语言模型(PLM),同时仍利用PLM的表面实现功能。受管道方法的启发,我们建议通过使用一系列基于通用域文本操作训练的模块来转换单项描述来生成文本:订购,聚合和段落压缩。我们训练PLMS在我们根据英语Wikipedia构建的合成语料库Wikifluent上进行这些操作。我们对两个主要三到文本数据集的实验 - WebNLG和E2E-表明,我们的方法可以从零摄影设置中从RDF Triples生成D2T。
In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets while still taking advantage of surface realization capabilities of PLMs. Inspired by pipeline approaches, we propose to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations: ordering, aggregation, and paragraph compression. We train PLMs for performing these operations on a synthetic corpus WikiFluent which we build from English Wikipedia. Our experiments on two major triple-to-text datasets -- WebNLG and E2E -- show that our approach enables D2T generation from RDF triples in zero-shot settings.