适用于少数拍神经序列标记的自适应自我训练

论文标题

适用于少数拍神经序列标记的自适应自我训练

Adaptive Self-training for Few-shot Neural Sequence Labeling

论文作者

Wang, Yaqing, Mukherjee, Subhabrata, Chu, Haoda, Tu, Yuancheng, Wu, Ming, Gao, Jing, Awadallah, Ahmed Hassan

论文摘要

序列标记是用于许多自然语言处理（NLP）任务的重要技术，例如命名实体识别（NER），对话框系统的插槽标记和语义解析。大规模的预训练的语言模型在对大量特定于任务的标签数据进行微调时，可以在这些任务上获得很好的性能。但是，由于人类注释的高成本以及敏感用户应用程序的隐私和数据访问约束，因此很难为多个任务和域而获得这样的大型标记数据集。对于需要在令牌级别上进行此类注释的序列标记任务的序列标记任务会加剧。在这项工作中，我们开发了解决神经序列标签模型的标签稀缺挑战的技术。具体而言，我们开发自训练和元学习技术，用于训练很少的标签的神经序列标记器。虽然自我训练是从大量未标记数据中学习的有效机制，但元学习有助于自适应样本重新加权，以减轻嘈杂的伪标签的错误传播。在六个基准数据集上进行的大量实验，其中两个用于大规模多语言NER和四个用于任务导向对话框系统的插槽标记数据集证明了我们方法的有效性。每个任务的每个班级只有10个标记的示例，我们的方法比最先进的系统获得了10％的改善，证明了其在低资源设置中的有效性。

Sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing. Large-scale pre-trained language models obtain very good performance on these tasks when fine-tuned on large amounts of task-specific labeled data. However, such large-scale labeled datasets are difficult to obtain for several tasks and domains due to the high cost of human annotation as well as privacy and data access constraints for sensitive user applications. This is exacerbated for sequence labeling tasks requiring such annotations at token-level. In this work, we develop techniques to address the label scarcity challenge for neural sequence labeling models. Specifically, we develop self-training and meta-learning techniques for training neural sequence taggers with few labels. While self-training serves as an effective mechanism to learn from large amounts of unlabeled data -- meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels. Extensive experiments on six benchmark datasets including two for massive multilingual NER and four slot tagging datasets for task-oriented dialog systems demonstrate the effectiveness of our method. With only 10 labeled examples for each class for each task, our method obtains 10% improvement over state-of-the-art systems demonstrating its effectiveness for the low-resource setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题