查找朋友和翻转Frenemies：使用图理论自动释义数据集扩展

论文标题

查找朋友和翻转Frenemies：使用图理论自动释义数据集扩展

Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory

论文作者

Chen, Hannah, Ji, Yangfeng, Evans, David

论文摘要

大多数NLP数据集都是手动标记的，因此标签不一致或大小有限。我们建议通过将数据集视为具有预期语义属性的图表来自动改进数据集的方法。我们从提供的句子对标签中构造一个释义图，并通过使用传输属性直接从原始句子对中推断出标签来创建增强数据集。我们使用结构平衡理论来识别图中的可能错误标签，并翻转其标签。我们对使用这些数据集训练的释义模型评估了我们的方法，从验证的BERT模型开始，并发现自动增强的训练集可导致更准确的模型。

Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题