论文标题

随机文本扰动有效,但并非总是

Random Text Perturbations Work, but not Always

论文作者

Wang, Zhengxiang

论文摘要

我们在二进制文本匹配分类任务上介绍了三个大规模实验,以评估随机文本扰动作为NLP的数据增强方法的有效性和概括性。发现增强可以根据模型是否在足够的原始培训示例上训练的三个神经分类模型的测试集表现带来负面影响和积极影响。无论是五个随机的文本编辑操作,用于增强文本的五个随机文本编辑操作还是分开应用。我们的研究表明,随机文本扰动的有效性是特定于任务的,通常不是正面的。

We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源