PEERDA：通过建模对等识别任务的同伴关系进行数据扩展

论文标题

PEERDA：通过建模对等识别任务的同伴关系进行数据扩展

PeerDA: Data Augmentation via Modeling Peer Relation for Span Identification Tasks

论文作者

Xu, Weiwen, Li, Xin, Deng, Yang, Lam, Wai, Bing, Lidong

论文摘要

跨度标识旨在从文本输入中识别特定的文本跨度，并将其分类为预定义的类别。与以前仅利用下属（sub）关系（即，如果跨度是某个类别的实例）的工作不同，则本文首次探讨了对等（PR）关系，这表明两个跨度是相同类别的实例并共享相似的特征。具体而言，提出了一种新型的同行数据增强（PEERDA）方法，该方法采用了与PR关系的跨度对作为培训的增强数据。 Peerda具有两个独特的优势：（1）有大量的PR跨度对增加了训练数据。（2）增强数据可以通过推动模型来利用跨度语义来防止训练有素的模型超越浅表跨类别映射。在七个领域的四个不同任务上的十个数据集上的实验结果证明了Peerda的有效性。值得注意的是，Peerda在其中六个方面取得了最新的结果。

Span identification aims at identifying specific text spans from text input and classifying them into pre-defined categories. Different from previous works that merely leverage the Subordinate (SUB) relation (i.e. if a span is an instance of a certain category) to train models, this paper for the first time explores the Peer (PR) relation, which indicates that two spans are instances of the same category and share similar features. Specifically, a novel Peer Data Augmentation (PeerDA) approach is proposed which employs span pairs with the PR relation as the augmentation data for training. PeerDA has two unique advantages: (1) There are a large number of PR span pairs for augmenting the training data. (2) The augmented data can prevent the trained model from over-fitting the superficial span-category mapping by pushing the model to leverage the span semantics. Experimental results on ten datasets over four diverse tasks across seven domains demonstrate the effectiveness of PeerDA. Notably, PeerDA achieves state-of-the-art results on six of them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题