论文标题
对命名实体识别的简单数据增强的分析
An Analysis of Simple Data Augmentation for Named Entity Recognition
论文作者
论文摘要
已经提出了简单但有效的数据增强技术,用于句子级别和句子对自然语言处理任务。受这些努力的启发,我们设计和比较了命名实体识别的数据增强,通常将其建模为令牌级序列标签问题。通过对来自生物医学和材料科学领域(I2B2-2010和MASCIP)的两个数据集的实验,我们表明,简单的增强可以提高基于反复和变压器的模型的性能,尤其是对于小型训练集。
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.