基于BERT的序列标记模型在中文医学文本属性提取上的应用

论文标题

基于BERT的序列标记模型在中文医学文本属性提取上的应用

Applications of BERT Based Sequence Tagging Models on Chinese Medical Text Attributes Extraction

论文作者

Zhao, Gang, Zhang, Teng, Wang, Chenxiao, Lv, Ping, Wu, Ji

论文摘要

我们将中国医学文本属性提取任务转换为序列标签或机器阅读理解任务。基于BERT预训练的模型，我们不仅尝试了广泛使用的LSTM-CRF序列标记模型，还尝试了其他序列模型，例如CNN，UCNN，WaveNet，自我注重等，它具有与LSTM+CRF相似的性能。这阐明了传统序列标记模型。由于不同序列标记模型的重点方面有很大的变化，因此结合这些模型为最终系统增加了多样性。通过这样做，我们的系统在中国医学文本属性提取的任务上取得了良好的性能（CCKS 2019任务1的子任务2）。

We convert the Chinese medical text attributes extraction task into a sequence tagging or machine reading comprehension task. Based on BERT pre-trained models, we have not only tried the widely used LSTM-CRF sequence tagging model, but also other sequence models, such as CNN, UCNN, WaveNet, SelfAttention, etc, which reaches similar performance as LSTM+CRF. This sheds a light on the traditional sequence tagging models. Since the aspect of emphasis for different sequence tagging models varies substantially, ensembling these models adds diversity to the final system. By doing so, our system achieves good performance on the task of Chinese medical text attributes extraction (subtask 2 of CCKS 2019 task 1).

下载PDF全文

下载文献需遵守相关版权规定

论文标题