论文标题
基于多样性和梯度不确定性的序列标记的深度积极学习
Deep Active Learning for Sequence Labeling Based on Diversity and Uncertainty in Gradient
论文作者
论文摘要
最近,一些研究调查了自然语言处理任务的积极学习(AL),以减轻数据依赖性。但是,对于查询选择,这些研究中的大多数主要依赖于基于不确定性的抽样,这些抽样通常不利用未标记数据的结构信息。这会导致批处理主动学习设置中的采样偏差,该设置一次选择几个样本。在这项工作中,我们证明,当在序列标记任务中既将不确定性和多样性都包含不确定性和多样性时,可以使用主动学习来减少标记的培训数据的量。我们通过在跨多个任务,数据集,模型以及始终超过经典不确定性的采样和基于多样性的采样方面的梯度嵌入方法中选择加权多样的方法来检查我们的基于序列方法的效果。
Recently, several studies have investigated active learning (AL) for natural language processing tasks to alleviate data dependency. However, for query selection, most of these studies mainly rely on uncertainty-based sampling, which generally does not exploit the structural information of the unlabeled data. This leads to a sampling bias in the batch active learning setting, which selects several samples at once. In this work, we demonstrate that the amount of labeled training data can be reduced using active learning when it incorporates both uncertainty and diversity in the sequence labeling task. We examined the effects of our sequence-based approach by selecting weighted diverse in the gradient embedding approach across multiple tasks, datasets, models, and consistently outperform classic uncertainty-based sampling and diversity-based sampling.