论文标题

查找朋友和翻转Frenemies:使用图理论自动释义数据集扩展

Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory

论文作者

Chen, Hannah, Ji, Yangfeng, Evans, David

论文摘要

大多数NLP数据集都是手动标记的,因此标签不一致或大小有限。我们建议通过将数据集视为具有预期语义属性的图表来自动改进数据集的方法。我们从提供的句子对标签中构造一个释义图,并通过使用传输属性直接从原始句子对中推断出标签来创建增强数据集。我们使用结构平衡理论来识别图中的可能错误标签,并翻转其标签。我们对使用这些数据集训练的释义模型评估了我们的方法,从验证的BERT模型开始,并发现自动增强的训练集可导致更准确的模型。

Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源