通过形态信息增强深层神经网络

论文标题

通过形态信息增强深层神经网络

Enhancing deep neural networks with morphological information

论文作者

Klemen, Matej, Krsnik, Luka, Robnik-Šikonja, Marko

论文摘要

深度学习方法在NLP中是优越的，因为它们能够从语言中提取信息特征和模式。两个最成功的神经体系结构是LSTM和Transformers，用于大型审计语言模型，例如BERT。尽管跨语性方法正在上升，但最新的NLP技术是设计和应用于英语的，而资源较低的语言则落后。在形态丰富的语言中，通过形态学传达信息，例如，通过修改单词词干的粘结词。现有的神经方法不会明确使用有关单词形态学的信息。我们分析为LSTM和BERT模型添加形态特征的效果。作为测试台，我们使用了许多资源较低的语言可用的三个任务：命名实体识别（NER），依赖性解析（DP）和注释过滤（CF）。我们构建了涉及LSTM和BERT模型的基线，我们通过以语音（POS）标签和通用功能的一部分添加其他输入来调整。我们比较来自不同语言家族的几种语言的模型。我们的结果表明，添加形态特征的影响不同，具体取决于特征的质量和任务。这些功能提高了基于LSTM的模型在NER和DP任务上的性能，而它们对CF任务的性能不利。对于基于BERT的模型，形态学特征只能在高质量高质量时提高DP的性能，同时在预测时没有显示实际的改进。即使对于高质量的功能，与大量多语言的BERT模型相比，在语言特定的BERT变体中，这些改进也不太明显。就像在NER和CF数据集中手动检查功能一样，我们只能尝试预测的功能，并发现它们不会在性能上进行任何实际改进。

Deep learning approaches are superior in NLP due to their ability to extract informative features and patterns from languages. The two most successful neural architectures are LSTM and transformers, used in large pretrained language models such as BERT. While cross-lingual approaches are on the rise, most current NLP techniques are designed and applied to English, and less-resourced languages are lagging behind. In morphologically rich languages, information is conveyed through morphology, e.g., through affixes modifying stems of words. Existing neural approaches do not explicitly use the information on word morphology. We analyse the effect of adding morphological features to LSTM and BERT models. As a testbed, we use three tasks available in many less-resourced languages: named entity recognition (NER), dependency parsing (DP), and comment filtering (CF). We construct baselines involving LSTM and BERT models, which we adjust by adding additional input in the form of part of speech (POS) tags and universal features. We compare models across several languages from different language families. Our results suggest that adding morphological features has mixed effects depending on the quality of features and the task. The features improve the performance of LSTM-based models on the NER and DP tasks, while they do not benefit the performance on the CF task. For BERT-based models, the morphological features only improve the performance on DP when they are of high quality while not showing practical improvement when they are predicted. Even for high-quality features, the improvements are less pronounced in language-specific BERT variants compared to massively multilingual BERT models. As in NER and CF datasets manually checked features are not available, we only experiment with predicted features and find that they do not cause any practical improvement in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题