对命名实体识别的简单数据增强的分析

论文标题

对命名实体识别的简单数据增强的分析

An Analysis of Simple Data Augmentation for Named Entity Recognition

论文作者

Dai, Xiang, Adel, Heike

论文摘要

已经提出了简单但有效的数据增强技术，用于句子级别和句子对自然语言处理任务。受这些努力的启发，我们设计和比较了命名实体识别的数据增强，通常将其建模为令牌级序列标签问题。通过对来自生物医学和材料科学领域（I2B2-2010和MASCIP）的两个数据集的实验，我们表明，简单的增强可以提高基于反复和变压器的模型的性能，尤其是对于小型训练集。

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题